Monday, October 21, 2024

MySQL 8.0.40 does not fix the regressions I hoped it would fix

Performance regressions arrived in InnoDB with MySQL 8.0.30. Eventually multiple bugs were filed. The worst regressions were from changes to the hash function (perhaps fixed in 8.0.36) and from changes to how functions are inlined for InnoDB (bug 111538). The problems are obvious if you run CPU-bound workloads, and my CPU-bound workload is sysbench with a cached database.

Bug 111538 is now closed and marked as fixed in 8.0.40. Alas, there are still significant CPU perf regressions in 8.0.40 relative to 8.0.28. My advice to upstream is to stop innovating if you don't have the CI setup to catch the new performance problems that your innovation creates. Using something like Nyrkio would help.

This post has results from sysbench on several servers using MySQL 8.0.28, 8.0.30, 8.0.33, 8.0.39 and 8.0.40 to show there are large regressions starting in 8.0.30, many of which are still there in 8.0.40. Tests were repeated on 5 different servers.

tl;dr

  • SELECT statements with a large in-list use much less CPU starting in MySQL 8.0.31 because bug 102037 was fixed. I found that via sysbench and filed a bug vs 8.0.22
  • bug 111538 should not have been closed as fixed
  • The scan microbenchmark still has regressions from 8.0.28 to 8.0.40
    • For all servers the QPS is less in 8.0.40 than in 8.0.28
    • On 4 of 5 servers the QPS is less in 8.0.40 than in 8.0.30
  • The update microbenchmark still has regressions from 8.0.28 to 8.0.40
    • In all cases the QPS is less in 8.0.40 than in 8.0.28
    • From 8.0.30 to 8.0.40 -- on 2 servers the QPS is less in 8.0.40, on 2 it is about the same and one 1 it has improved 
    • Regressions after 8.0.28 are bad for the small servers (beelink and pn53 below) and really bad (QPS drops in half) for one of the large servers (delll32 below). That is the subject of a pending blog post.

Builds

I used MySQL 8.0.28, 8.0.30, 8.0.33, 8.0.39 and 8.0.40 compiled from source using CMAKE_BUILD_TYPE =Release, -O2 and -fno-omit-frame-pointer.

Hardware

The servers are
  • beelink
    • Beelink SER4 with an AMD Ryzen 7 4700 CPU with SMT disabled, 8 cores, 16G of RAM, Ubuntu 22.04 and ext4 on 1 NVMe device.
  • pn53
    • ASUS ExpertCenter PN53 with AMD Ryzen 7 7735HS, with SMT disabled, 8 cores, 32G RAM, Ubuntu 22.04 and ext4 on 1 NVMe device.
  • socket2
    • SuperMicro SuperWorkstation 7049A-T with 2 sockets, 12 cores/socket, 64G RAM, one m.2 SSD (2TB,  ext4). The CPUs are Intel Xeon Silver 4214R CPU @ 2.40GHz
  • dell32
    • Dell Precision 7865 Tower Workstation with 1 socket, 128G RAM, AMD Ryzen Threadripper PRO 5975WX with 32-Cores, 2 m.2 SSD (each 2TB, RAID SW 0, ext4)
  • ax162-s
    • AMD EPYC 9454P 48-Core Processor with SMT disabled, 128G RAM, Ubuntu 22.04 and ext4 on 2 NVMe devices with SW RAID 1. This is in the Hetzner cloud.
Benchmark

I used sysbench and my usage is explained here. A full run has 42 microbenchmarks and most test only 1 type of SQL statement. In some cases I skip the read-only tests that run prior to writes to save time. The database is cached by InnoDB.

The benchmark is run with ...
  • beelink - 1 thread, 1 table, 30M rows
  • pn53 - 1 thread, 1 table, 50M rows
  • socket2 - 16 threads, 8 tables, 10M rows/table
  • dell32 - 24 threads, 8 tables, 10M rows/table
  • ax162-s - 40 threads, 8 tables, 10M rows/table
Each microbenchmark runs for 300 seconds if read-only and 600 seconds otherwise. Prepared statements were enabled.

Results 

All of the results se relative QPS (rQPS) where:
  • rQPS is: (QPS for my version / QPS for base version)
  • base version is the QPS from MySQL 8.0.28
  • my version is one of the other versions (8.0.30, 8.0.33, 8.0.39, 8.0.40)
The scan microbenchmark is the canary in the coal mine for bug 111538 as most of the CPU time is spent in InnoDB. Regressions that arrived after 8.0.28 remain unfixed in 8.0.40. The rQPS drops from MySQL 8.0.30 to 8.0.40 on 4 of the 5 servers.

                rQPS    rQPS
server          8.0.30  8.0.40
------          ------  ------
beelink         0.89    0.78
pn53            0.91    0.83
socket2         0.90    0.84
dell32          0.70    0.91
ax162-s         0.91    0.83

The update-index microbenchmark also has large regressions after 8.0.28. The QPS drops from 8.0.30 to 8.0.40 on 2 servers (beelink, pn53), remains about the same on 2 of them (socket2, dell32) and improves on 1 (ax162-s). The result for dell32 is lousy as update QPS drops almost in half after 8.0.28 and I will have more about that in a pending blog post.

                rQPS    rQPS
server          8.0.30  8.0.40
------          ------  ------
beelink         0.88    0.76
pn53            0.91    0.78
socket2         0.89    0.92
dell32          0.56    0.56
ax162-s         0.80    0.90

Results: beelink

Summary
  • the big improvement to random-points starting in 8.0.33 is from fixing bug 102037
  • for scan the problem is new CPU overhead after 8.0.28
  • for update-index the problem is more CPU overhead, (maybe) more context switches and (maybe) more KB written to storage per update
Relative to: x.my8028_rel_o2nofp.z11a_bee.pk1
col-1 : x.my8030_rel_o2nofp.z11a_bee.pk1
col-2 : x.my8033_rel_o2nofp.z11a_bee.pk1
col-3 : x.my8039_rel_o2nofp.z11a_bee.pk1
col-4 : x.my8040_rel_o2nofp.z11a_bee.pk1

col-1 col-2 col-3 col-4
0.91 1.06 1.09 1.13 hot-points_range=100
1.00 0.91 0.92 0.92 point-query.pre_range=100
1.00 0.91 0.93 0.92 point-query_range=100
0.91 1.01 1.03 1.06 points-covered-pk.pre_range=100
0.90 1.00 1.03 1.06 points-covered-pk_range=100
0.89 1.00 1.03 1.07 points-covered-si.pre_range=100
0.89 1.00 1.03 1.06 points-covered-si_range=100
0.89 1.00 1.02 1.04 points-notcovered-pk.pre_range=100
0.89 1.00 1.02 1.05 points-notcovered-pk_range=100
0.88 0.93 0.98 1.01 points-notcovered-si.pre_range=100
0.88 0.93 0.98 1.01 points-notcovered-si_range=100
0.96 2.20 2.30 2.35 random-points.pre_range=1000
0.90 1.00 1.03 1.05 random-points.pre_range=100
0.94 0.91 0.93 0.94 random-points.pre_range=10
0.94 2.20 2.30 2.35 random-points_range=1000
0.90 1.00 1.03 1.05 random-points_range=100
0.93 0.91 0.93 0.94 random-points_range=10
0.95 0.90 0.90 0.96 range-covered-pk.pre_range=100
0.95 0.90 0.90 0.96 range-covered-pk_range=100
0.96 0.91 0.91 0.97 range-covered-si.pre_range=100
0.95 0.90 0.90 0.98 range-covered-si_range=100
0.97 0.93 0.94 0.97 range-notcovered-pk.pre_range=100
0.96 0.92 0.93 0.97 range-notcovered-pk_range=100
0.88 0.86 0.91 0.94 range-notcovered-si.pre_range=100
0.87 0.86 0.91 0.94 range-notcovered-si_range=100
0.97 0.94 0.93 0.98 read-only.pre_range=10000
0.97 0.93 0.93 0.94 read-only.pre_range=100
0.99 0.93 0.94 0.94 read-only.pre_range=10
0.98 0.95 0.94 0.99 read-only_range=10000
1.00 0.95 0.95 0.97 read-only_range=100
0.98 0.94 0.94 0.96 read-only_range=10
0.89 0.82 0.80 0.78 scan_range=100
0.95 0.88 0.89 0.90 delete_range=100
0.94 0.88 0.88 0.88 insert_range=100
0.97 0.93 0.93 0.95 read-write_range=100
0.96 0.92 0.92 0.93 read-write_range=10
0.88 0.82 0.84 0.76 update-index_range=100
0.94 0.87 0.89 0.90 update-inlist_range=100
0.96 0.89 0.90 0.91 update-nonindex_range=100
0.97 0.91 0.91 0.91 update-one_range=100
0.95 0.90 0.90 0.91 update-zipf_range=100
0.93 0.88 0.89 0.87 write-only_range=10000

For scan the problem is CPU overhead (see cpu/o below) which is 1.28X larger in 8.0.40 vs 8.0.28.

sb.met.scan.range100.pk1.dop1
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.222553        2.534   0       0.001   0.035   55      x.my8028_rel_o2nofp
0.251913        6.640   0       0       0.052   49      x.my8030_rel_o2nofp
0.273314        7.107   0       0       0.039   45      x.my8033_rel_o2nofp
0.282176        15.578  0       0       0.053   44      x.my8039_rel_o2nofp
0.285792        7.622   0       0       0.041   43      x.my8040_rel_o2nofp
--- relative to first result
1.13            2.62    1       0.00    1.49    0.89    x.my8030_rel_o2nofp
1.23            2.80    1       0.00    1.11    0.82    x.my8033_rel_o2nofp
1.27            6.15    1       0.00    1.51    0.80    x.my8039_rel_o2nofp
1.28            3.01    1       0.00    1.17    0.78    x.my8040_rel_o2nofp

For update-index there are several changes after 8.0.28
  • CPU per update (cpu/o) is 1.33X larger in 8.0.40 vs 8.0.28
  • KB written per update (wKB/o) is 1.47X larger in 8.0.40 vs 8.0.28. 
    • A possible cause is that the writeback code path isn't slowed by regressions while the update codepath is slowed. But I am just waving my hands.
    • This one is confusing because I use the same redo log size for all versions. However, configuration for the redo log changes in 8.0.30 (see here) and as a result I used 15 X 1G log segments in 8.0.28 vs 32 X 480M log segments in 8.0.40. I am repeating tests for 8.0.28 with 32 X 480M log segments. Using the output of the Pages ..., written line from SHOW ENGINE INNODB STATUS the pages written per update value is 0.97 for 8.0.28 vs 1.31 for 8.0.40. Perhaps 8.0.30+ is more aggressive about doing dirty page writeback but I use the same values for innodb_max_dirty_pages_pct_lwm (=80) and innodb_max_dirty_pages_pct (=90) for 8.0.28 to 8.0.40. I repeated tests for 8.0.28 using 32 redo log segments, but results (QPS, HW metrics) doesn't change with that.
  • Context switches per update (cs/o) is 1.24X larger in 8.0.40 vs 8.0.28, but I am less sure this is a problem as the absolute rates (15.312 vs 18.917 per update) are not that big.
sb.met.update-index.range100.pk1.dop1
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.008211        15.312  0       0       47.066  2649    x.my8028_rel_o2nofp
0.009487        17.353  0       0       66.043  2337    x.my8030_rel_o2nofp
0.009875        17.636  0       0       66.695  2178    x.my8033_rel_o2nofp
0.010162        17.959  0       0       67.203  2216    x.my8039_rel_o2nofp
0.010903        18.917  0       0       68.954  2024    x.my8040_rel_o2nofp
--- relative to first result
1.16            1.13    1       1       1.40    0.88    x.my8030_rel_o2nofp
1.20            1.15    1       1       1.42    0.82    x.my8033_rel_o2nofp
1.24            1.17    1       1       1.43    0.84    x.my8039_rel_o2nofp
1.33            1.24    1       1       1.47    0.76    x.my8040_rel_o2nofp

Results: pn53

Summary
  • the big improvement to random-points starting in 8.0.33 is from fixing bug 102037
  • for scan the problem is new CPU overhead after 8.0.28
  • for update-index the problem is more CPU overhead, (maybe) more context switches and (maybe) more KB written to storage per update
  • the changes to HW overheads here are similar to the changes above for the beelink server
Relative to: x.my8028_rel_o2nofp.z11a_c8r32.pk1
col-1 : x.my8030_rel_o2nofp.z11a_c8r32.pk1
col-2 : x.my8033_rel_o2nofp.z11a_c8r32.pk1
col-3 : x.my8039_rel_o2nofp.z11a_c8r32.pk1
col-4 : x.my8040_rel_o2nofp.z11a_c8r32.pk1

col-1 col-2 col-3 col-4
0.89 1.12 1.13 1.18 hot-points_range=100
0.97 0.94 0.93 0.94 point-query.pre_range=100
0.98 0.95 0.94 0.94 point-query_range=100
0.89 1.03 1.05 1.10 points-covered-pk_range=100
0.87 1.02 1.04 1.09 points-covered-si_range=100
0.91 1.03 1.05 1.11 points-notcovered-pk_range=100
0.84 0.96 1.01 1.06 points-notcovered-si_range=100
0.95 2.14 2.21 2.30 random-points_range=1000
0.89 1.03 1.06 1.10 random-points_range=100
0.96 0.94 0.95 0.97 random-points_range=10
0.97 0.93 0.92 0.99 range-covered-pk_range=100
0.98 0.93 0.93 0.98 range-covered-si_range=100
0.98 0.92 0.94 0.97 range-notcovered-pk_range=100
0.91 0.89 0.92 0.99 range-notcovered-si.pre_range=100
0.92 0.89 0.92 0.99 range-notcovered-si_range=100
0.97 0.94 0.98 1.01 read-only_range=10000
0.98 0.94 0.95 0.96 read-only_range=100
0.98 0.94 0.94 0.96 read-only_range=10
0.91 0.83 0.83 0.83 scan_range=100
0.96 0.91 0.92 0.92 delete_range=100
0.94 0.90 0.90 0.91 insert_range=100
0.97 0.93 0.94 0.95 read-write_range=100
0.97 0.93 0.94 0.95 read-write_range=10
0.91 0.85 0.84 0.78 update-index_range=100
0.94 0.91 0.91 0.94 update-inlist_range=100
0.96 0.92 0.92 0.93 update-nonindex_range=100
0.96 0.92 0.92 0.92 update-one_range=100
0.96 0.92 0.92 0.93 update-zipf_range=100
0.94 0.90 0.91 0.90 write-only_range=10000

For scan the problem is CPU overhead (see cpu/o below) which is 1.22X larger in 8.0.40 vs 8.0.28.

sb.met.scan.range100.pk1.dop1
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.532922        6.523   0       0       0.092   23      x.my8028_rel_o2nofp
0.586497        15.751  0       0       0.122   21      x.my8030_rel_o2nofp
0.650201        17.147  0       0       0.105   19      x.my8033_rel_o2nofp
0.651073        17.145  0       0       0.098   19      x.my8039_rel_o2nofp
0.649427        17.975  0       0       0.128   19      x.my8040_rel_o2nofp
--- relative to first result
1.10            2.41    1       1       1.33    0.91    x.my8030_rel_o2nofp
1.22            2.63    1       1       1.14    0.83    x.my8033_rel_o2nofp
1.22            2.63    1       1       1.07    0.83    x.my8039_rel_o2nofp
1.22            2.76    1       1       1.39    0.83    x.my8040_rel_o2nofp

For update-index there are several changes after 8.0.28
  • the changes here are similar to the ones above for the beelink server
  • CPU per update (cpu/o) is 1.29X larger in 8.0.40 vs 8.0.28
  • KB written per update (wKB/o) is 1.48X larger in 8.0.40 vs 8.0.28
  • context switches per update (cs/o) is 1.22X larger in 8.0.40 vs 8.0.28
sb.met.update-index.range100.pk1.dop1
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.004972        11.052  0       0       43.246  4198    x.my8028_rel_o2nofp
0.005658        11.948  0       0       61.606  3810    x.my8030_rel_o2nofp
0.005927        12.288  0       0       62.306  3578    x.my8033_rel_o2nofp
0.005997        12.536  0       0       62.804  3544    x.my8039_rel_o2nofp
0.006401        13.486  0       0       63.98   3294    x.my8040_rel_o2nofp
--- relative to first result
1.14            1.08    1       1       1.42    0.91    x.my8030_rel_o2nofp
1.19            1.11    1       1       1.44    0.85    x.my8033_rel_o2nofp
1.21            1.13    1       1       1.45    0.84    x.my8039_rel_o2nofp
1.29            1.22    1       1       1.48    0.78    x.my8040_rel_o2nofp

Results: socket2

Summary
  • the big improvement to random-points starting in 8.0.33 is from fixing bug 102037
  • for scan the problem is new CPU overhead after 8.0.28
  • for update-index the problem is more CPU overhead, (maybe) more context switches and (maybe) more KB written to storage per update
  • the changes to HW overheads here and the regressions here are similar to but smaller than the changes above for the beelink and pn53 servers
Relative to: x.my8028_rel_o2nofp.z11a_c24r64.pk1
col-1 : x.my8030_rel_o2nofp.z11a_c24r64.pk1
col-2 : x.my8033_rel_o2nofp.z11a_c24r64.pk1
col-3 : x.my8039_rel_o2nofp.z11a_c24r64.pk1
col-4 : x.my8040_rel_o2nofp.z11a_c24r64.pk1

col-1 col-2 col-3 col-4
0.94 1.04 1.01 1.10 hot-points_range=100
0.97 0.94 0.97 0.96 point-query.pre_range=100
0.99 0.96 0.97 0.97 point-query_range=100
0.92 1.00 1.04 1.06 points-covered-pk_range=100
0.90 0.98 1.03 1.04 points-covered-si_range=100
0.92 1.00 1.04 1.06 points-notcovered-pk_range=100
0.89 0.94 0.98 1.00 points-notcovered-si_range=100
0.93 1.76 1.81 1.87 random-points_range=1000
0.92 1.00 1.03 1.06 random-points_range=100
0.94 0.93 0.95 0.97 random-points_range=10
0.98 0.94 0.95 1.00 range-covered-pk_range=100
0.97 0.94 0.94 1.00 range-covered-si_range=100
0.98 0.94 0.95 0.98 range-notcovered-pk_range=100
0.91 0.91 0.94 0.98 range-notcovered-si.pre_range=100
0.90 0.90 0.93 0.98 range-notcovered-si_range=100
0.98 0.97 0.97 1.00 read-only_range=10000
0.99 0.96 0.96 0.98 read-only_range=100
0.99 0.95 0.96 0.97 read-only_range=10
0.90 0.84 0.80 0.84 scan_range=100
0.95 0.93 0.94 0.94 delete_range=100
0.93 0.92 0.93 0.93 insert_range=100
0.97 0.94 0.95 0.96 read-write_range=100
0.97 0.94 0.95 0.96 read-write_range=10
0.89 0.90 0.88 0.92 update-index_range=100
0.96 0.96 0.98 0.98 update-inlist_range=100
0.95 0.94 0.94 0.94 update-nonindex_range=100
0.95 0.93 0.93 0.93 update-one_range=100
0.96 0.95 0.94 0.94 update-zipf_range=100
0.91 0.90 0.92 0.90 write-only_range=10000

For scan the problem is CPU overhead (see cpu/o below) which is 1.16X larger in 8.0.40 vs 8.0.28.

sb.met.scan.range100.pk1.dop8
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.180593        4.756   0       0       0.007   174     x.my8028_rel_o2nofp
0.199852        5.846   0       0       0.008   156     x.my8030_rel_o2nofp
0.213651        6.156   0       0       0.008   146     x.my8033_rel_o2nofp
0.223563        6.615   0       0       0.008   140     x.my8039_rel_o2nofp
0.209727        6.136   0       0       0.008   146     x.my8040_rel_o2nofp
--- relative to first result
1.11            1.23    1       1       1.14    0.90    x.my8030_rel_o2nofp
1.18            1.29    1       1       1.14    0.84    x.my8033_rel_o2nofp
1.24            1.39    1       1       1.14    0.80    x.my8039_rel_o2nofp
1.16            1.29    1       1       1.14    0.84    x.my8040_rel_o2nofp

For update-index there are several changes after 8.0.28
  • the changes here are similar to the ones above for the beelink and pn53 servers, but not as large. And the regression here is also not as large.
  • CPU per update (cpu/o) is 1.09X larger in 8.0.40 vs 8.0.28
  • KB written per update (wKB/o) is 1.30X larger in 8.0.40 vs 8.0.28
  • context switches per update (cs/o) is 1.05X larger in 8.0.40 vs 8.0.28
sb.met.update-index.range100.pk1.dop16
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.001677        11.125  0       0       16.578  38477   x.my8028_rel_o2nofp
0.001859        12.496  0       0       24.596  34256   x.my8030_rel_o2nof
0.001849        11.945  0       0       22.664  34786   x.my8033_rel_o2nofp
0.001907        12.115  0       0       24.136  33971   x.my8039_rel_o2nofp
0.001823        11.670  0       0       21.581  35336   x.my8040_rel_o2nofp
--- relative to first result
1.11            1.12    1       1       1.48    0.89    x.my8030_rel_o2nofp
1.10            1.07    1       1       1.37    0.90    x.my8033_rel_o2nofp
1.14            1.09    1       1       1.46    0.88    x.my8039_rel_o2nofp
1.09            1.05    1       1       1.30    0.92    x.my8040_rel_o2nofp

Results: dell32

Summary
  • the big improvement to random-points starting in 8.0.33 is from fixing bug 102037
  • for scan the problem is new CPU overhead after 8.0.28
  • for update-index the problem is more CPU overhead, (maybe) more context switches and (maybe) more KB written to storage per update. I still don't understand why the regression for update-index here is so much worse than on the other servers.
Relative to: x.my8028_rel_o2nofp.z11a_c32r128.pk1
col-1 : x.my8030_rel_o2nofp.z11a_c32r128.pk1
col-2 : x.my8033_rel_o2nofp.z11a_c32r128.pk1
col-3 : x.my8039_rel_o2nofp.z11a_c32r128.pk1
col-4 : x.my8040_rel_o2nofp.z11a_c32r128.pk1

col-1 col-2 col-3 col-4
0.91 1.01 1.04 1.06 hot-points_range=100
0.98 0.94 0.93 0.95 point-query.pre_range=100
0.98 0.94 0.92 0.95 point-query_range=100
0.89 0.96 1.00 1.05 points-covered-pk.pre_range=100
0.89 0.96 1.00 1.05 points-covered-pk_range=100
0.87 0.93 1.00 1.04 points-covered-si.pre_range=100
0.88 0.94 1.01 1.05 points-covered-si_range=100
0.89 0.96 1.00 1.05 points-notcovered-pk.pre_range=100
0.89 0.96 1.00 1.05 points-notcovered-pk_range=100
0.89 0.90 0.98 1.02 points-notcovered-si.pre_range=100
0.88 0.90 0.97 1.01 points-notcovered-si_range=100
0.92 1.73 1.80 1.89 random-points.pre_range=1000
0.89 0.96 1.00 1.05 random-points.pre_range=100
0.94 0.92 0.93 0.96 random-points.pre_range=10
0.92 1.75 1.82 1.91 random-points_range=1000
0.89 0.96 1.00 1.05 random-points_range=100
0.94 0.92 0.93 0.96 random-points_range=10
0.96 0.93 0.93 0.99 range-covered-pk.pre_range=100
0.96 0.93 0.93 0.99 range-covered-pk_range=100
0.97 0.93 0.93 0.99 range-covered-si.pre_range=100
0.97 0.93 0.93 0.99 range-covered-si_range=100
0.96 0.92 0.93 0.96 range-notcovered-pk.pre_range=100
0.96 0.92 0.93 0.96 range-notcovered-pk_range=100
0.88 0.86 0.92 0.97 range-notcovered-si.pre_range=100
0.88 0.86 0.91 0.97 range-notcovered-si_range=100
0.98 0.96 0.98 1.01 read-only.pre_range=10000
0.97 0.93 0.94 0.95 read-only.pre_range=100
0.97 0.94 0.93 0.95 read-only.pre_range=10
0.99 0.96 0.97 1.01 read-only_range=10000
0.97 0.93 0.94 0.95 read-only_range=100
0.97 0.94 0.94 0.95 read-only_range=10
0.70 0.87 0.85 0.91 scan_range=100
0.94 0.92 0.93 0.93 delete_range=100
0.95 0.94 0.94 0.94 insert_range=100
0.97 0.93 0.94 0.95 read-write_range=100
0.97 0.93 0.93 0.95 read-write_range=10
0.56 0.56 0.56 0.56 update-index_range=100
0.99 0.99 1.00 1.03 update-inlist_range=100
0.96 0.95 0.95 0.95 update-nonindex_range=100
0.96 0.93 0.94 0.94 update-one_range=100
0.96 0.95 0.95 0.96 update-zipf_range=100
0.95 0.93 0.93 0.94 write-only_range=10000

For scan the problem is CPU overhead (see cpu/o below) which is 1.13X larger in 8.0.40 vs 8.0.28.

sb.met.scan.range100.pk1.dop8
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.093496        3.256   0       0       0.006   246     x.my8028_rel_o2nofp
0.104925        5.301   0       0       0.007   172     x.my8030_rel_o2nofp
0.110480        4.195   0       0       0.006   215     x.my8033_rel_o2nofp
0.113259        4.306   0       0       0.006   210     x.my8039_rel_o2nofp
0.106105        4.065   0       0       0.006   225     x.my8040_rel_o2nofp
--- relative to first result
1.12            1.63    1       1       1.17    0.70    x.my8030_rel_o2nofp
1.18            1.29    1       1       1.00    0.87    x.my8033_rel_o2nofp
1.21            1.32    1       1       1.00    0.85    x.my8039_rel_o2nofp
1.13            1.25    1       1       1.00    0.91    x.my8040_rel_o2nofp

For update-index there are several changes after 8.0.28
  • the regression here is huge and I am not sure about the root causes
  • CPU per update (cpu/o) is 1.50X larger in 8.0.40 vs 8.0.28
  • KB written per update (wKB/o) is 2.32X larger in 8.0.40 vs 8.0.28
  • context switches per update (cs/o) is 1.41X larger in 8.0.40 vs 8.0.28
sb.met.update-index.range100.pk1.dop24
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.001185        10.777  0       0       14.275  51319   x.my8028_rel_o2nofp
0.001739        15.475  0       0       33.68   28515   x.my8030_rel_o2nofp
0.001794        15.192  0       0       32.953  28872   x.my8033_rel_o2nofp
0.001783        15.196  0       0       33.05   28983   x.my8039_rel_o2nofp
0.001781        15.242  0       0       33.139  28966   x.my8040_rel_o2nofp
--- relative to first result
1.47            1.44    1       1       2.36    0.56    x.my8030_rel_o2nofp
1.51            1.41    1       1       2.31    0.56    x.my8033_rel_o2nofp
1.50            1.41    1       1       2.32    0.56    x.my8039_rel_o2nofp
1.50            1.41    1       1       2.32    0.56    x.my8040_rel_o2nofp

Results: ax162-s

Summary
  • the big improvement to random-points starting in 8.0.33 is from fixing bug 102037
  • for scan the problem is new CPU overhead after 8.0.28
  • for update-index the problem is more CPU overhead, (maybe) more context switches and (maybe) more KB written to storage per update
  • the changes to HW overheads here and the regressions here are similar to but smaller than the changes above for the beelink and pn53 servers
Relative to: x.my8028_rel_o2nofp.z11a_c32r128.pk1
col-1 : x.my8030_rel_o2nofp.z11a_c32r128.pk1
col-2 : x.my8033_rel_o2nofp.z11a_c32r128.pk1
col-3 : x.my8039_rel_o2nofp.z11a_c32r128.pk1
col-4 : x.my8040_rel_o2nofp.z11a_c32r128.pk1

col-1 col-2 col-3 col-4
0.89 0.93 0.99 0.99 hot-points_range=100
0.98 0.94 0.94 0.94 point-query.pre_range=100
0.96 0.93 0.93 0.94 point-query_range=100
0.90 0.96 1.01 1.02 points-covered-pk.pre_range=100
0.89 0.95 1.00 1.02 points-covered-pk_range=100
0.87 0.89 0.95 0.98 points-covered-si.pre_range=100
0.88 0.89 0.96 0.99 points-covered-si_range=100
0.90 0.96 1.01 1.02 points-notcovered-pk.pre_range=100
0.89 0.95 1.00 1.02 points-notcovered-pk_range=100
0.86 0.88 0.94 0.96 points-notcovered-si.pre_range=100
0.87 0.88 0.94 0.96 points-notcovered-si_range=100
0.95 1.61 1.68 1.72 random-points.pre_range=1000
0.90 0.96 1.01 1.02 random-points.pre_range=100
0.92 0.91 0.95 0.94 random-points.pre_range=10
0.95 1.62 1.69 1.72 random-points_range=1000
0.89 0.95 1.00 1.02 random-points_range=100
0.91 0.90 0.94 0.94 random-points_range=10
0.92 0.90 0.92 0.95 range-covered-pk.pre_range=100
0.92 0.90 0.92 0.95 range-covered-pk_range=100
0.93 0.91 0.93 0.96 range-covered-si.pre_range=100
0.94 0.91 0.94 0.96 range-covered-si_range=100
0.93 0.90 0.93 0.94 range-notcovered-pk.pre_range=100
0.93 0.90 0.93 0.94 range-notcovered-pk_range=100
0.87 0.87 0.92 0.93 range-notcovered-si.pre_range=100
0.87 0.86 0.92 0.93 range-notcovered-si_range=100
0.99 0.97 0.98 1.01 read-only.pre_range=10000
0.95 0.92 0.93 0.94 read-only.pre_range=100
0.96 0.92 0.94 0.94 read-only.pre_range=10
0.99 0.97 0.98 1.00 read-only_range=10000
0.95 0.92 0.93 0.93 read-only_range=100
0.96 0.92 0.93 0.94 read-only_range=10
0.91 0.82 0.81 0.83 scan_range=100
0.93 0.91 0.94 0.93 delete_range=100
0.96 0.94 0.94 0.94 insert_range=100
0.95 0.94 0.95 0.95 read-write_range=100
0.96 0.94 0.95 0.95 read-write_range=10
0.80 0.85 0.83 0.90 update-index_range=100
0.99 1.04 1.03 1.02 update-inlist_range=100
0.98 0.97 0.96 0.96 update-nonindex_range=100
0.96 0.93 0.94 0.95 update-one_range=100
0.97 0.97 0.96 0.95 update-zipf_range=100
0.94 0.93 0.94 0.92 write-only_range=10000

For scan the problem is CPU overhead (see cpu/o below) which is 1.20X larger in 8.0.40 vs 8.0.28.

sb.met.scan.range100.pk1.dop8
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.018767        0.552   0       0       0.052   872     x.my8028_rel_o2nofp
0.020741        0.746   0       0       0.016   793     x.my8030_rel_o2nofp
0.022690        0.808   0       0       0.004   713     x.my8033_rel_o2nofp
0.023079        0.791   0       0       0.003   706     x.my8039_rel_o2nofp
0.022533        0.800   0       0       0.013   725     x.my8040_rel_o2nofp
--- relative to first result
1.11            1.35    1       1       0.31    0.91    x.my8030_rel_o2nofp
1.21            1.46    1       1       0.08    0.82    x.my8033_rel_o2nofp
1.23            1.43    1       1       0.06    0.81    x.my8039_rel_o2nofp
1.20            1.45    1       1       0.25    0.83    x.my8040_rel_o2nofp

For update-index there are several changes after 8.0.28
  • the changes here are similar to the ones above for the beelink and pn53 servers, but not as large. And the regression here is also not as large
  • CPU per update (cpu/o) is 1.09X larger in 8.0.40 vs 8.0.28
  • KB written per update (wKB/o) is 1.37X larger in 8.0.40 vs 8.0.28
  • context switches per update (cs/o) is 1.03X larger in 8.0.40 vs 8.0.28
sb.met.update-index.range100.pk1.dop40
--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.000550        12.157  0       0       8.354   83129   x.my8028_rel_o2nofp
0.000641        13.426  0       0       14.791  66432   x.my8030_rel_o2nofp
0.000623        12.988  0       0       13.155  70890   x.my8033_rel_o2nofp
0.000637        13.044  0       0       14.158  68996   x.my8039_rel_o2nofp
0.000600        12.514  0       0       11.416  74917   x.my8040_rel_o2nofp
--- relative to first result
1.17            1.10    1       1       1.77    0.80    x.my8030_rel_o2nofp
1.13            1.07    1       1       1.57    0.85    x.my8033_rel_o2nofp
1.16            1.07    1       1       1.69    0.83    x.my8039_rel_o2nofp
1.09            1.03    1       1       1.37    0.90    x.my8040_rel_o2nofp




No comments:

Post a Comment

The impact of PGO, LTO and more for MySQL on a small CPU: insert benchmark

This post has results for the impact of PGO, LTO and other compiler options for MySQL using the Insert Benchmark . I previously shared resu...