Posts

Showing posts from August, 2024

Postgres 17beta3 vs the Insert Benchmark on a medium server: looking good

This has benchmark results for Postgres 12 through 17beta3 using the Insert Benchmark and a medium server. By small, medium or large server I mean < 10 cores for small, 10 to 19 cores for medium, 20+ cores for large. A recent result up to Postgres 17 beta2 from the same server  is here .  This work was done by  Small Datum LLC . tl;dr 17beta3 looks (mostly) good There might be regressions in 17 beta1, beta2 and beta3 on the l.i1 and l.i2 benchmark steps related to get_actual_variable_range Builds, configuration and hardware I compiled Postgres versions 12.19, 12.20, 13.15, 13.16, 14.12, 14.13, 15.7, 15.8, 16.3, 16.4, 17beta1, 17beta2 and 17beta3 from source using -O2 -fno-omit-frame-pointer . The server is a  c2d-highcpu-32  instance type on GCP (c2d high-CPU) with 32 vCPU, 64G RAM and SMT disabled so there are 16 cores. It uses Ubuntu 22.04 and storage is ext4 (data=writeback) using SW RAID 0 over 2 locally attached NVMe devices. The configuration file is in the pg* subdirectories

Postgres 17beta3 vs sysbench on a small server: looking good

Image
This has benchmark results for Postgres 17beta3 using sysbench and a small server. By small, medium or large server I mean < 10 cores for small, 10 to 19 cores for medium, 20+ cores for large. A recent result for Postgres 17beta3 on a medium server  is here . tl;dr - 17beta3 looks great There are no regressions Throughput on write microbenchmarks is often ~5% to ~10% better than 16.x Throughput on hot-points (read-only) is more than 2X faster in 17beta than 16.x Builds, configuration and hardware I compiled Postgres versions 10.23, 11.22, 12.20, 13.16, 14.13, 15.7, 15.8, 16.0, 16.1, 16.2, 16.3, 16.4, 17beta1, 17beta2 and 17beta3 from source. The server is named  v3 or PN53 here  and has 8 AMD cores with SMT disabled, 16 G of RAM and uses Ubuntu 22.04 and XFS with 1 m.2 device. I need to switch to ext-4 soon to match what I use elsewhere. The configuration files have the name conf.diff.cx10a_c8r32 and are in the  pg* subdirectories here . Benchmark I used sysbench and my usage is  ex

Postgres 17beta3 vs sysbench on a medium server: looking good

Image
This has benchmark results for Postgres 17beta3 using sysbench and a medium server. By small, medium or large server I mean < 10 cores for small, 10 to 19 cores for medium, 20+ cores for large. A recent result for Postgres 17beta2 is here . tl;dr 17beta3 looks good Write microbenchmarks are much faster in 17beta1 and 17beta2 vs 16.3 There might be a regression in Postgres 16 for two of the update-only benchmarks. More work is in progress to explain this. Read microbenchmarks have similar performance between 16.3, 17beta1 and 17beta2 Builds, configuration and hardware I compiled Postgres versions 12.19, 12.20, 13.15, 13.16, 14.12, 14.13, 15.7, 15.8, 16.3, 16.4, 17beta1, 17beta2 and 17beta3 from source. The server is a  c2d-highcpu-32  instance type on GCP (c2d high-CPU) with 32 vCPU, 64G RAM and SMT disabled so there are 16 cores. It uses Ubuntu 22.04 and storage is ext4 (data=writeback) using SW RAID 0 over 2 locally attached NVMe devices. The configuration files have the name conf.

MySQL regressions: delete vs InnoDB

I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now. I created flamegraphs while  running sysbench  with cached databases (more likely to be CPU bound) and the problem here occurs on an  8-core PN53  where sysbench was run with 1 thread. Here I use  perf record -e cycles  to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead. The numbers below are the percentage of perf samples from a function and its callees and this is a proxy for CPU time. Here I used cycles as the HW counter with perf but I have more tests in progress to get flamegraphs with other counters. The flamegraphs are here . I am curious about whether more data is written to the binlog per transaction with 8.0

MySQL regressions: update-nonindex vs InnoDB

I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now. I created flamegraphs while  running sysbench  with cached databases (more likely to be CPU bound) and the problem here occurs on an  8-core PN53  where sysbench was run with 1 thread. Here I use  perf record -e cycles  to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead. The flamegraphs are here . The workload here is the update-nonindex microbenchmark. The throughput for a release relative to MySQL 5.6.51 is -> (QPS for $version) / (QPS for 5.6.51). The results below show that 8.0.37 gets about 62% of the QPS relative to 5.6.51. 0.86 in 5.7.44 0.79 in 8.0.11 0.67 in 8.0.28 0.62 in 8.0.37 From the numbers above th

MySQL regressions: skip_concurrency_ticket

I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now. I created flamegraphs while  running sysbench  with cached databases (more likely to be CPU bound) and the problem here occurs on an  8-core PN53  where sysbench was run with 1 thread. Here I use  perf record -e cycles  to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead. The problem here is that during the scan benchymark the  skip_concurrency_ticket  function accounts for ~3% of CPU in 8.0.37, half that in 5.7.44 and the function doesn't exist in 5.6.51. It is called from innobase_srv_conc_enter_innodb  which was a bit simpler in 5.6 . The flamegraphs (*.svg files) are here . Also visible in those flamegraphs, th

MySQL regressions: binlog_log_row

I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now. I created flamegraphs while running sysbench with cached databases (more likely to be CPU bound) and the problem here occurs on an 8-core PN53 where sysbench was run with 1 thread. For the insert microbenchmark the insert rate ... in MySQL 5.7.44 is 82% of the rate in MySQL 5.6.51 in MySQL 8.0.11 is 72% of the rate in MySQL 5.6.51 in MySQL 8.0.37 is 57% of the rate in MySQL 5.6.51  Here I use perf record -e cycles to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead. The percentage of samples that binlog_log_row and its children account for is ... 3.04% in MySQL 5.6.51 4.28% in MySQL 5.7.44 10.29% in MySQL 8.0.37 I d