Tuesday, February 21, 2023

Sysbench, Arm & x86, public cloud

This has results for sysbench with MyRocks, MySQL/InnoDB and Postgres using public cloud servers. The goal is to document how relative performance changes from single-threaded to high-concurrency workloads. 

By relative performance I mean that I present performance as a ratio: throughput on an Arm server / throughput on an x86 server.  By throughput I mean QPS for all of the workloads except for scan, which does a full scan, and throughput for scan is millions of rows read/second.

I compare c7g vs c6i instance types from AWS and then t2a vs c2 server types from GCP. I am comparing absolute performance rather than price performance. For AWS the c7g list price is a bit lower than c6i while for GCP the t2a and c2 list prices are about the same.

Regardless, my goal is only to understand how relative throughput changes with concurrency. Of course, I also look for perf bugs that might be fixable.

tl;dr:

  • Results for MyRocks have the most variance and for Postgres have the least variance
  • x86 usually gets more throughput than Arm although the advantage is smaller to non-existent when the CPU is fully/over-subscribed. A common throughput ratio for Arm was between 0.7 and 0.9 meaning that Arm got between 70% and 90% of the throughput (QPS) compared to an x86 in the same price range. Alas, there is much detail and nuance. In some cases the Arm server got the same or better throughput. Relatively, the Arm does much better when the CPU is saturated (number of concurrent requests and background threads >= number of Arm cores) because it has more cores but the per-core perf on the x86 is better.
  • InnoDB does the worst on Arm relative to x86 with AWS (c7g vs c6i) as the workload & server size are scaled up for write-heavy workloads. MyRocks is similar but not as bad in this respect.
  • The advantage for x86 vs Arm is smaller on the write heavy benchmark steps, especially with the workloads where the CPU is fully/over-subscribed
  • The advantage for x86 vs Arm is larger when comparing t2a vs c2 (GCP) than c7g vs c6i (AWS)
Note that I disable hyperthreading on the x86 servers because from past experience the use of hyperthreading hurts DBMS performance. I don't reevaluate this (hyperthreading enabled vs disabled) for every benchmark because that takes more time but I will revisit the topic. The impact of disabling hyperthreading is that the Arm server has 2X more cores than the comparable x86. For example, the 16xlarge instances on AWS (c7g is Arm, c6i is x86) have 64 vcpu and there are 64 cores on Arm, 64 cores on x86 with hyperthread enabled and 32 cores on x86 with hyperthread disabled.

Benchmark

I use upstream sysbench with a few extra Lua scripts from here driven by shell scripts from here. Read the Lua scripts to see what SQL statements are used. For example, the scan query does a full scan with a WHERE clause that filters all rows on the DBMS and returns 0 rows (and read this fun bug to learn what the MongoDB planner does for such queries).

There are 42 Lua scripts for which I provide results. Each of them represent a (micro)benchmark step that was run for 10 minutes. I place them into three groups -- point, range, write -- based on the common operation done for each where point does point queries, range does range queries and write does insert/update/delete.

The benchmark is invoked by this script (r.sh) which then invoked cmp_pg.shcmp_in.sh and cmp_rx.sh. All scripts end up calling all_small.sh to run sysbench for a sequence of Lua scripts. Each Lua script was run for 10 minutes using 8 tables with 32 clients (threads, connections). Note that the scripts linked above (r.sh, cmp_??.sh) are similar to the scripts I used but I had to edit them for different levels of concurrency and server size.

In all cases there were 20M rows/table but the number of tables increased with the server size. The benchmark was run twice -- once with most queries done against a primary key index and then again with a secondary index used for most queries (sysbench --secondary).

I used Postgres 15.1, MySQL 8.0.31 for InnoDB and FB MySQL 8.0.28 for MyRocks.

The test database always fit in the database buffer pool (or RocksDB block cache). 

For each server size the benchmark is repeated first with a number of clients so that the CPU is under-subscribed (WRT to number of x86 cores) and second with a number of clients so that the CPU is full subscribed for queries and possibly over-subscribed for writes. For example on the 2xlarge AWS instances both have 8 vcpu, the c6i has 4 cores (hyperthreading disabled) and the c7g has 8 cores. The test was run with 1 and 4 clients. With 4 cliens the x86 CPU is fully subscribed for queries and over-subscribed for writes because background threads must also run for b-tree checkpoint or LSM compaction.

The impact of under-subscribed vs fully/over-subscribed is visible on the graphs. The difference between x86 and Arm is larger on the read heavy microbenchmarks because the x86 instances have better per-core performance than Arm. On the write-heavy tests the difference is smaller.

Hardware

Tests were repeated on AWS and GCP servers. For AWS I used c7g (Arm) and c6i (x86). For GCP I used t2a (Arm) and c2 (x86). The c7g and c6i servers are closer in performance than the t2a and c2. Benchmarks were repeated for several instance/server sizes.

All servers used Ubuntu 22.04. I did not enable huge pages for Postgres or InnoDB.

AWS
  • 2xlarge
    • Has 8 vcpu and 16G RAM. There are 8 cores on c7g and 4 on c6i because I disabled hyperthreading. Both servers used EBS for the database (io2, 256G, 10K IOPs).
    • Sysbench was run with 1 table and 20M rows for 1 and 4 clients. 
    • The database config files are here for MyRocks, for Postgres and for InnoDB.
  • 8xlarge
    • Has 32 vcpu and 64G RAM. There are 32 cores on c7g and 16 on c6i because I disabled hyperthreading. Both servers used EBS for the database (io2, 1TB, 49K IOPs).
    • Sysbench was run with 4 tables and 20M rows/table for 8 and 16 clients. 
    • The database config files are here for MyRocks, for Postgres and for InnoDB.
  • 16xlarge
    • Has 64 vcpu and 128G RAM. There are 64 cores on c7g and 32 on c6i because I disabled hyperthreading. Both servers used EBS for the database (io2, 2TB, 49K IOPs).
    • Sysbench was run with 8 tables and 20M rows/table for 16 and 32 clients. 
    • The database config files are here for MyRocks, for Postgres and for InnoDB.
GCP
  • t2a-standard-8, c2-standard-8
    • Has 8 vcpu and 16G RAM (t2-standard-8, c2-standard-8). There are 8 cores on t2a and 4 on c2 because I disabled hyperthreading. Both have 32G RAM. The database was on SSD Persistent disk (400G).
    • Sysbench was run with 1 table and 20M rows for 1 and 4 clients. 
    • The database config files are here for MyRocks, for Postgres and for InnoDB.
  • t2a-standard-32, c2-standard-30
    • The t2a instance has 128G RAM, 32 vcpu and 32 cores. The c2 instance has 120G RAM, 30 vcpu and 15 cores because I disabled hyperthreading. The database was on SSD Persistent disk (2TB).
    • Sysbench was run with 4 tables and 20M rows/table for 8 and 15 clients.
    • The database config files are here for MyRocks, for Postgres and for InnoDB.
Results

Each benchmark consists of 42 (micro)benchmark steps. Each benchmark step is a Lua script and most scripts just test one operation (point query, range query, insert, update, etc) thus it is best to think of them as microbenchmarks. The read-only and read-write benchmark steps are the traditional sysbench workload and consist of different SELECT statements that use range scans and sometimes do aggregation.

Below there are charts for each benchmark that show the relative performance for each benchmark step as: Arm throughput / x86 throughput. The Arm server is faster when this ratio is larger than 1. Alas, the ratios are almost always less than 1 because the x86 server was usually faster.

The x-axis starts at 0.4 or 0.5 rather than 0.0 to make it easier to see the differences. The spreadsheet is here for AWS and for GCP. It is easier to see detail on the spreadsheet than on the inline images below.

For each benchmark step there is a pair of results named pk0 and pk1. By pk1 I mean that most queries used the primary key index. By pk0 I mean that most queries used the secondary index (sysbench --secondary).

Result sections

Each graph below has 42 pairs of lines -- each pair has a result for pk0 and pk1. So I scanned all of them and attempted to draw some conclusions using eyeball stats rather than the real thing. The last section of this blog post has summary statistics (min, max, average, median, standard deviation) for each of the microbenchmark groups (point, range, writes).

On each graph the 42 microbenchmarks are in three groups from left to right (Point, Range, Write). The Point group is for point queries, the Range group is for range queries and the Write group is for writes.

In the sections that follow where I describe whether x86 does better than, similar to or worse than Arm the judgement is mostly based on the average values for the throughput ratio with help from the median values. Generally, if the ratio is between 0.95 and 1.05 I claim the throughput is similar.

Results: 8 vcpu, 1 client, AWS

The CPU was under-subscribed.
  • MyRocks has the most variance
  • For queries
    • x86 is better than Arm for MyRocks and Postgres
    • x86 is similar to Arm for InnoDB
  • For writes
    • x86 is better than Arm for MyRocks and Postgres
    • x86 is worse than Arm for InnoDB
The graphs are also on the spreadsheet.
Results: 8 vcpu, 4 clients, AWS

The CPU is fully/over-subscribed during the write-heavy benchmark steps.
  • MyRocks has the most variance
  • For queries
    • x86 is better than Arm
  • For writes
    • x86 is worse than Arm for Postgres and InnoDB
    • x86 is similar to Arm for MyRocks
    • For writes the advantage shifts towards Arm given the CPU is fully/over-subscribed
The graphs are also on the spreadsheet.
Results: 32 vcpu, 8 clients, AWS

The CPU was under-subscribed.
  • MyRocks has the most variance
  • For queries
    • x86 is better than Arm except for MyRocks+pk1 where they are similar
  • For writes
    • x86 is better than Arm for MyRocks and InnoDB
    • x86 is similar to Arm for Postgres
The graphs are also on the spreadsheet.
Results: 32 vcpu, 16 clients, AWS

The CPU is fully/over-subscribed during the write-heavy benchmark steps.
  • MyRocks has the most variance
  • Compared to the under-subscribed case above the advantage shifts towards Arm here
  • For point queries
    • x86 is better than Arm for Postgres and InnoDB
    • x86 is worse than Arm for MyRocks with pk0 and similar with pk1
  • For range queries
    • x86 is better than Arm for Postgres and InnoDB
    • x86 is similar to Arm for MyRocks with pk0 and better with pk1
  • For writes
    • x86 is worse than Arm for Postgres and InnoDB
    • x86 is similar to Arm for MyRocks with pk0 and worse than Arm for pk1
The graphs are also on the spreadsheet.
Results: 64 vcpu, 16 clients, AWS

The CPU was under-subscribed.
  • MyRocks has the most variance
  • For point queries
    • x86 is better than Arm except for MyRocks+pk0 where they are similar
  • For range queries
    • x86 is better than Arm
  • For writes
    • x86 is better than Arm for MyRocks and InnoDB.
    • For InnoDB the x86 advantage increases as the instance size increases from 2xl to 8xl to 16xl here. This is true for both the under-subscribed and fully/over-subscribed workloads although the change appears to be worse in the under-subscribed CPU case.
    • x86 is similar to Arm for Postgres
The graphs are also on the spreadsheet.
Results: 64 vcpu, 32 clients, AWS

The CPU is fully/over-subscribed during the write-heavy benchmark steps.
  • MyRocks has the most variance
  • For point queries
    • x86 is better than Arm for Postgres and InnoDB
    • x86 is similar to Arm for MyRocks with pk0 and worse with pk1
  • For range queries
    • x86 is better than Arm for Postgres and InnoDB
    • x86 is similar to Arm for MyRocks with pk0 and worse with pk1
  • For writes
    • x86 is better than Arm for MyRocks and InnoDB
    • x86 is worse than Arm for Postgres
The graphs are also on the spreadsheet.
Results: 8 vcpu, 1 client, GCP

The CPU was under-subscribed.
  • For queries
    • x86 is better than Arm
  • For writes
    • x86 is better than Arm
The graphs are also on the spreadsheet.
Results: 8 vcpu, 4 clients, GCP

The CPU is fully/over-subscribed during the write-heavy benchmark steps.
  • For queries
    • x86 is better than Arm
  • For writes
    • x86 is better than Arm
The graphs are also on the spreadsheet.
Results: 30/32 vcpu, 8 clients, GCP

The CPU was under-subscribed.
  • For queries
    • x86 is better than Arm
  • For writes
    • x86 is better than Arm
The graphs are also on the spreadsheet.
Results: 30/32 vcpu, 15 clients, GCP

The CPU is fully/over-subscribed during the write-heavy benchmark steps.
  • For queries
    • x86 is better than Arm
  • For writes
    • x86 is better than Arm
The graphs are also on the spreadsheet.
Summary statistics

The summary statistics group the micro benchmarks into 3 groups:
  • Point - benchmarks that do point queries
  • Range - benchmarks that do range queries, including one that does a full range scan
  • Write - benchmarks that do writes
The statistics are computed for the throughput ratio which is: (throughput for Arm / throughput for x86) and throughput is QPS for all tests except scan. For scan the throughput is millions of rows read/second. The throughput for Arm is better than x86 when the ratio is greater than 1.

For each table there are results for pk0 and pk1. The pk1 results are the normal way to run sysbench -- most queries use the PK index. The pk0 results are from sysbench --secondary so that most queries use a secondary index (which is created on the same columns as the PK index had been created).

Summary statistics: 8 vcpu, 1 client, AWS

For MyRocks:

MyRocksPointRangeWrite
min: pk00.810.790.75
max: pk01.121.070.89
avg: pk00.910.870.83
median: pk00.900.860.84
stddev: pk00.0860.0710.049
min: pk10.800.790.73
max: pk10.980.920.97
avg: pk10.900.860.84
median: pk10.880.860.83
stddev: pk10.0490.0390.075

For Postgres:

PostgresPointRangeWrite
min: pk00.820.780.80
max: pk00.940.930.92
avg: pk00.900.870.88
median: pk00.910.870.88
stddev: pk00.0310.0390.035
min: pk10.830.780.76
max: pk10.960.920.89
avg: pk10.900.840.84
median: pk10.900.840.84
stddev: pk10.0320.0440.039

For InnoDB:
InnoDBPointRangeWrite
min: pk00.960.931.09
max: pk01.291.211.30
avg: pk01.021.051.19
median: pk00.971.041.21
stddev: pk00.1060.0860.066
min: pk10.930.921.06
max: pk11.261.171.29
avg: pk11.001.011.17
median: pk10.961.001.20
stddev: pk10.0980.0830.070

Summary statistics: 8 vcpu, 4 clients, AWS

For MyRocks:

MyRocksPointRangeWrite
min: pk00.670.720.82
max: pk01.061.031.04
avg: pk00.820.840.93
median: pk00.800.840.94
stddev: pk00.1040.0840.075
min: pk10.780.770.81
max: pk11.091.611.09
avg: pk10.920.930.96
median: pk10.920.900.97
stddev: pk10.0760.2000.085

For Postgres:

PostgresPointRangeWrite
min: pk00.720.700.72
max: pk00.820.852.51
avg: pk00.780.771.27
median: pk00.800.761.13
stddev: pk00.0360.0430.513
min: pk10.650.700.77
max: pk10.820.842.24
avg: pk10.770.751.22
median: pk10.800.731.19
stddev: pk10.0570.0520.406

For InnoDB:

InnoDBPointRangeWrite
min: pk00.790.790.84
max: pk00.920.991.27
avg: pk00.860.871.10
median: pk00.860.881.12
stddev: pk00.0360.0570.157
min: pk10.760.790.84
max: pk10.921.001.28
avg: pk10.860.881.11
median: pk10.870.891.13
stddev: pk10.0450.0590.159

Summary statistics: 32 vcpu, 8 clients, AWS

For MyRocks:

MyRocksPointRangeWrite
min: pk00.850.830.74
max: pk01.481.741.13
avg: pk01.051.000.89
median: pk00.990.950.84
stddev: pk00.1580.2180.136
min: pk10.780.760.64
max: pk11.181.070.82
avg: pk10.940.870.74
median: pk10.900.860.74
stddev: pk10.1270.0810.056

For Postgres:

PostgresPointRangeWrite
min: pk00.780.840.90
max: pk01.101.041.13
avg: pk00.860.911.03
median: pk00.820.891.03
stddev: pk00.0960.0640.075
min: pk10.710.770.93
max: pk11.121.041.15
avg: pk10.850.911.05
median: pk10.830.871.06
stddev: pk10.1100.0870.064

For InnoDB:

InnoDBPointRangeWrite
min: pk00.760.790.72
max: pk00.910.900.98
avg: pk00.860.860.83
median: pk00.860.860.77
stddev: pk00.0330.0280.112
min: pk10.720.710.73
max: pk10.970.920.96
avg: pk10.790.840.81
median: pk10.760.850.75
stddev: pk10.0720.0660.098

Summary statistics: 32 vcpu, 16 clients, AWS

For MyRocks:

MyRocksPointRangeWrite
min: pk00.850.830.83
max: pk01.732.281.29
avg: pk01.241.091.04
median: pk01.141.050.99
stddev: pk00.2440.3500.167
min: pk10.770.760.81
max: pk11.321.281.05
avg: pk11.060.910.91
median: pk11.030.890.88
stddev: pk10.1680.1400.079

For Postgres:

PostgresPointRangeWrite
min: pk00.750.750.79
max: pk00.910.971.45
avg: pk00.810.821.17
median: pk00.810.801.24
stddev: pk00.0440.0650.219
min: pk10.750.740.86
max: pk11.061.071.55
avg: pk10.910.891.21
median: pk10.910.881.24
stddev: pk10.0910.0930.204

For InnoDB:

InnoDBPointRangeWrite
min: pk00.660.690.78
max: pk00.870.872.34
avg: pk00.790.761.21
median: pk00.810.761.00
stddev: pk00.0600.0530.512
min: pk10.740.750.86
max: pk10.910.892.03
avg: pk10.830.791.12
median: pk10.850.760.97
stddev: pk10.0530.0480.343

Summary statistics: 64 vcpu, 16 clients, AWS

For MyRocks:

MyRocksPointRangeWrite
min: pk00.740.730.62
max: pk01.111.371.03
avg: pk00.950.900.79
median: pk00.950.880.71
stddev: pk00.0980.1510.155
min: pk10.700.690.57
max: pk10.961.700.90
avg: pk10.850.870.74
median: pk10.840.820.70
stddev: pk10.0830.2420.119

For Postgres:

PostgresPointRangeWrite
min: pk00.750.760.84
max: pk00.900.921.75
avg: pk00.830.831.02
median: pk00.820.820.95
stddev: pk00.0330.0400.266
min: pk10.750.740.74
max: pk10.900.901.72
avg: pk10.820.810.97
median: pk10.820.800.91
stddev: pk10.0340.0450.273

For InnoDB:

InnoDBPointRangeWrite
min: pk00.700.720.53
max: pk00.820.810.79
avg: pk00.770.760.64
median: pk00.800.770.58
stddev: pk00.0420.0300.111
min: pk10.640.700.48
max: pk10.840.820.74
avg: pk10.770.730.60
median: pk10.790.720.54
stddev: pk10.0620.0380.110

Summary statistics: 64 vcpu, 32 clients, AWS

For MyRocks:

MyRocksPointRangeWrite
min: pk00.760.750.58
max: pk01.111.381.06
avg: pk00.931.050.84
median: pk00.951.090.79
stddev: pk00.0810.2090.150
min: pk10.710.740.53
max: pk12.453.891.46
avg: pk11.281.540.89
median: pk11.031.510.77
stddev: pk10.4850.8340.278

For Postgres:

PostgresPointRangeWrite
min: pk00.740.750.85
max: pk00.941.032.14
avg: pk00.840.831.17
median: pk00.850.821.08
stddev: pk00.0560.0790.374
min: pk10.720.740.74
max: pk10.941.022.14
avg: pk10.840.821.12
median: pk10.850.831.06
stddev: pk10.0570.0740.388

For InnoDB:

InnoDBPointRangeWrite
min: pk00.670.710.55
max: pk00.860.841.12
avg: pk00.790.770.87
median: pk00.810.770.86
stddev: pk00.0580.0350.175
min: pk10.640.700.57
max: pk10.880.901.08
avg: pk10.800.760.83
median: pk10.830.730.82
stddev: pk10.0720.0640.157

Summary statistics: 8 vcpu, 1 clients, GCP

For MyRocks:

MyRocksPointRangeWrite
min: pk00.600.570.57
max: pk00.870.950.75
avg: pk00.750.680.67
median: pk00.760.660.67
stddev: pk00.0630.0930.047
min: pk10.540.590.56
max: pk10.820.770.78
avg: pk10.740.670.65
median: pk10.750.670.66
stddev: pk10.0750.0640.063

For Postgres:

PostgresPointRangeWrite
min: pk00.700.700.66
max: pk00.940.890.75
avg: pk00.800.770.70
median: pk00.810.750.70
stddev: pk00.0700.0550.030
min: pk10.700.720.65
max: pk10.810.900.75
avg: pk10.750.780.71
median: pk10.740.760.72
stddev: pk10.0350.0520.033

For InnoDB:

InnoDBPointRangeWrite
min: pk00.570.600.56
max: pk00.750.770.71
avg: pk00.700.680.63
median: pk00.710.680.62
stddev: pk00.0550.0560.053
min: pk10.560.600.59
max: pk10.990.940.72
avg: pk10.780.710.67
median: pk10.730.690.68
stddev: pk10.1370.1020.041

Summary statistics: 8 vcpu, 4 clients, GCP

For MyRocks:

MyRocksPointRangeWrite
min: pk00.540.520.69
max: pk00.882.770.91
avg: pk00.710.800.77
median: pk00.700.670.74
stddev: pk00.0930.5520.074
min: pk10.550.510.59
max: pk10.970.740.97
avg: pk10.700.620.77
median: pk10.690.630.77
stddev: pk10.1010.0730.100

For Postgres:

PostgresPointRangeWrite
min: pk00.610.610.69
max: pk00.840.960.86
avg: pk00.720.730.79
median: pk00.710.700.80
stddev: pk00.0610.1040.053
min: pk10.600.620.67
max: pk10.850.920.86
avg: pk10.700.710.78
median: pk10.700.680.79
stddev: pk10.0580.1040.066

For InnoDB:

InnoDBPointRangeWrite
min: pk00.500.550.67
max: pk00.860.891.64
avg: pk00.720.670.95
median: pk00.750.650.85
stddev: pk00.1020.1190.284
min: pk10.460.500.62
max: pk10.730.861.54
avg: pk10.640.590.87
median: pk10.670.530.79
stddev: pk10.0870.1150.267

Summary statistics: 30/32 vcpu, 8 clients, GCP

For MyRocks:

MyRocksPointRangeWrite
min: pk00.500.470.49
max: pk00.941.240.84
avg: pk00.720.680.63
median: pk00.710.650.59
stddev: pk00.1030.1820.105
min: pk10.460.470.46
max: pk10.810.730.84
avg: pk10.680.590.59
median: pk10.730.590.56
stddev: pk10.1040.0900.109

For Postgres:

PostgresPointRangeWrite
min: pk00.640.640.61
max: pk00.750.850.78
avg: pk00.680.710.69
median: pk00.680.670.70
stddev: pk00.0330.0690.050
min: pk10.650.640.55
max: pk10.740.860.77
avg: pk10.680.700.67
median: pk10.680.680.69
stddev: pk10.0270.0740.061

For InnoDB:

InnoDBPointRangeWrite
min: pk00.480.540.50
max: pk00.780.830.76
avg: pk00.660.650.60
median: pk00.690.640.56
stddev: pk00.0870.1070.089
min: pk10.470.520.45
max: pk10.730.890.70
avg: pk10.640.600.56
median: pk10.670.540.53
stddev: pk10.0840.1060.076

Summary statistics: 30/32 vcpu, 15 clients, GCP

For MyRocks:

MyRocksPointRangeWrite
min: pk00.510.440.55
max: pk00.771.251.11
avg: pk00.650.700.70
median: pk00.650.650.66
stddev: pk00.0720.2160.155
min: pk10.520.530.50
max: pk10.980.791.04
avg: pk10.750.650.71
median: pk10.780.640.66
stddev: pk10.1130.0830.155

For Postgres:

PostgresPointRangeWrite
min: pk00.600.620.72
max: pk00.810.981.03
avg: pk00.730.730.87
median: pk00.740.700.83
stddev: pk00.0560.0980.111
min: pk10.600.610.70
max: pk10.810.951.02
avg: pk10.730.730.87
median: pk10.750.710.90
stddev: pk10.0620.1030.111

For InnoDB:

InnoDBPointRangeWrite
min: pk00.490.550.50
max: pk00.790.871.72
avg: pk00.670.680.92
median: pk00.700.670.87
stddev: pk00.0840.1060.328
min: pk10.470.520.50
max: pk10.770.901.53
avg: pk10.680.640.83
median: pk10.710.580.75
stddev: pk10.0940.1140.275

No comments:

Post a Comment