tag:blogger.com,1999:blog-91495239278647510872024-03-18T11:43:35.965-07:00Small DatumMark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.comBlogger623125tag:blogger.com,1999:blog-9149523927864751087.post-21002423753827524952024-03-18T10:08:00.000-07:002024-03-18T10:08:58.002-07:00Comparing Postgres and MySQL on the insert benchmark with a small server<p>My primary goal with the benchmarks I run has been to identify performance regressions, especially ones that can be fixed to make open source databases better. And so I focus on comparing old and new versions of one DBMS at a time to identify where things get better or worse. But here I compare Postgres with MySQL (InnoDB & MyRocks) to show that neither is the best for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> -- all are good, but none are perfect.<br /><br />The per-DBMS results are here for <a href="https://smalldatum.blogspot.com/2024/03/trying-to-tune-postgres-for-insert.html">Postgres</a>, <a href="https://smalldatum.blogspot.com/2024/03/yes-another-insert-benchmark-result.html">InnoDB</a> and <a href="https://smalldatum.blogspot.com/2024/03/yes-another-insert-benchmark-result_17.html">MyRocks</a>. Those posts also have links to the configurations and builds that I used. This post shares the same result but makes it easier to compare across DBMS. <br /><br />Results here are from a small server (8 cores) with a low concurrency workload (1 client, <= 3 concurrent connections). Results from a larger server are pending and might not be the same as what I share here.</p><p>Summary of throughput for the IO-bound workload</p><p></p><ul style="text-align: left;"><li>Initial load in key order (l.i0)</li><ul><li>Postgres is fastest</li></ul><li>Write-only with secondary index maintenance (l.i1, l.i2)</li><ul><li>MyRocks is fastest</li></ul><li>Range queries (qr100, qr500, qr1000)</li><ul><li>Postgres is fastest</li></ul><li>Point queries (qp100, qp500, qp1000)</li><ul><li>MyRocks is fastest, Postgres failed to sustain the target write rate for qp1000</li></ul></ul><div>Summary of efficiency for the IO-bound workload</div><ul style="text-align: left;"><li>Space efficiency</li><ul><li>MyRocks is best, Postgres/InnoDB used ~4X/~3x more space</li></ul><li>Write efficiency</li><ul><li>MyRocks is best and on the l.i1 benchmark step Postgres and InnoDB write ~9X and ~80X more KB to storage per insert than MyRocks.</li></ul><li>Read efficiency</li><ul><li>MyRocks is the best and that might surprise people. Both InnoDB and Postgres do more read IO per query for both point and range queries. Bloom filters and less space amplification might explain this.</li></ul></ul><div>Summary of throughput over time</div><div><ul style="text-align: left;"><li>All DBMS have noise (variance) in some cases. Results for MyRocks aren't any worse than for Postgres or InnoDB.</li></ul></div><p></p><p></p><div><div><b>Build + Configuration</b></div><div><br />Versions tested<br /><ul style="text-align: left;"><li><span style="text-align: right;">pg162_def.cx9a2a_bee</span></li><ul><li>Postgres 16.2 and the cx9a2_bee config</li></ul><li>my8036_rel.cz10a_bee</li><ul><li>Upstream MySQL 8.0.36 with InnoDB and the cz10a_bee config</li></ul><li><span style="text-align: right;">fbmy8028_rel_221222.cza1_bee</span></li><ul><li>MyRocks 8.0.28 from code as of 2023-12-22 at git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b, cza1_bee config</li><li>Compression is enabled, which saves space at the cost of more CPU</li></ul></ul></div><div>The config files <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/arc/mar24.bee.pg">are here</a>.</div></div><div><br /></div><div><b>The Benchmark</b></div><div><div><div><br /></div><div>The benchmark is run with 1 client. It is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> and was run in two setups</div><div><ul><li>cached - database has 30M rows and fits in memory</li><li>IO-bound - database has 800M rows and is larger than memory, </li></ul></div><div>The test server was named SER4 in the previous report. It has 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device.</div><div><br />The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. X is 30M for cached and 800M for IO-bound.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts Y rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate. Y is 80M for cached and 4M for IO-bound.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and Y is 20M for cached and 1M for IO-bound.</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for Z seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. Z is 3600 for cached and 1800 for IO-bound.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results: throughput</b></div><div><br /></div><div>The performance reports are here <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.some/all.html">for cached</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html">for IO-bound</a>.</div></div></div></div></div></div><div><br /></div><div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From the summary <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.some/all.html#summary">for cached</a>:</div><div><ul style="text-align: left;"><li>the base case is Postgres 16.2, numbers in red mean Postgres is faster</li><li>comparing InnoDB and MyRocks with the base case</li><ul><li>l.i0</li><ul><li>InnoDB - relative QPS is <span style="background-color: #f4cccc;">0.75</span></li><li><span style="background-color: white;">MyRocks - relative QPS is </span><span style="background-color: #f4cccc;">0.77</span></li></ul><li>l.x - I ignore this for now</li><li>l.i1, l.i2</li><ul><li>InnoDB - relative QPS is <span style="background-color: #f4cccc;">0.86</span>, <span style="background-color: #d9ead3;">1.63</span></li><li>MyRocks - relative QPS is <span style="background-color: #d9ead3;">1.12</span>, <span style="background-color: #d9ead3;">1.47</span></li></ul><li>qr100, qr500, qr1000</li><ul><li>InnoDB - relative QPS is <span style="background-color: #f4cccc;">0.40</span>, <span style="background-color: #f4cccc;">0.42</span>, <span style="background-color: #f4cccc;">0.41</span></li><li><span style="background-color: white;">MyRocks - relative QPS is </span><span style="background-color: #f4cccc;">0.19</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.16</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.16</span></li></ul><li>qp100, qp500, qp1000</li><ul><li>InnoDB - relative QPS is <span style="background-color: #f4cccc;">0.82</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.81</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.82</span></li><li><span style="background-color: white;">MyRocks - relative QPS is </span><span style="background-color: #f4cccc;">0.71</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.70</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.69</span></li></ul></ul></ul><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html#summary">for IO-bound</a>:</div><div><ul><li>the base case is Postgres 16.2, numbers in red mean Postgres is faster</li><li>comparing InnoDB and MyRocks with the base case</li><ul><li>l.i0</li><ul><li>InnoDB - relative QPS is <span style="background-color: #f4cccc;">0.74</span></li><li><span style="background-color: white;">MyRocks - relative QPS is </span><span style="background-color: #f4cccc;">0.77</span></li></ul><li>l.x - I ignore this for now</li><li>l.i1, l.i2</li><ul><li>InnoDB - relative QPS is <span style="background-color: #f4cccc;">0.83</span>, <span style="background-color: #d9ead3;">18.35</span></li><li>MyRocks - relative QPS is <span style="background-color: #d9ead3;">11.45</span>, <span style="background-color: #d9ead3;">73.55</span></li></ul><li>qr100, qr500, qr1000</li><ul><li>InnoDB - relative QPS is <span style="background-color: #f4cccc;">0.42</span>, <span style="background-color: #f4cccc;">0.47</span>, <span style="background-color: #f4cccc;">0.55</span></li><li><span style="background-color: white;">MyRocks - relative QPS is </span><span style="background-color: #f4cccc;">0.07</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.06</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.06</span></li></ul><li>qp100, qp500, qp1000</li><ul><li>InnoDB - relative QPS is <span style="background-color: #d9ead3;">1.56</span>, <span style="background-color: #d9ead3;">1.46</span>, <span style="background-color: #d9ead3;">1.44</span></li><li><span style="background-color: white;">MyRocks - relative QPS is </span><span style="background-color: #d9ead3;">2.15</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.13</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.21</span></li><li><span style="background-color: white;">Postgres failed to sustain the target write rate during qp1000. The target was ~1000/s and it sustained 927/s.</span></li></ul></ul></ul></div></div></div><div><b>Results: efficiency</b></div></div></div></div><div><br />Here I focus on the results from the IO-bound workload. The <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html#l.i0.metrics">efficiency section</a> of the IO-bound perf report has a lot of information.</div><div><br /></div><div>At test end (after qp1000.L6) the database size in GB is 192.6 for Postgres, 166.4 for InnoDB and 54.8 for MyRocks. Compared to MyRocks, Postgres uses ~3.5X more space and InnoDB uses ~3X more space. Compression is enabled for MyRocks which saves on space at the cost of more CPU.</div><div><br /></div><div>Explaining l.i0 - load in key order</div><div><ul style="text-align: left;"><li>Data <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html#l.i0.metrics">is here</a></li><li>Postgres uses the least CPU per statement (see cpupq, CPU per query). It is ~1.2X larger with InnoDB and MyRocks. CPU probably explains the perf difference.</li><li>MyRocks write the least to storage per statement (see wkbpi, KB written per insert)</li></ul><div>Explaining l.i1 - write-only, 50 rows/commit</div></div><div><ul style="text-align: left;"><li>Data <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html#l.i1.metrics">is here</a></li><li>MyRocks does the fewest reads from storage per statement (see rpq, reads per query). The rate is ~278X larger for Postgres and ~859X larger for InnoDB. The ratio is so large because non-unique secondary index maintenance <a href="https://smalldatum.blogspot.com/2017/09/write-heavy-workloads-with-myrocks.html">is read free</a> for MyRocks. The <a href="https://smalldatum.blogspot.com/2023/02/the-value-of-innodb-change-buffer.html">InnoDB change buffer</a> provides a similar but less significant benefit (I enabled the change buffer for these tests). Alas, with Postgres the leaf pages for secondary indexes must undergo read-modify-write as the <a href="https://www.postgresql.org/docs/current/storage-hot.html">heap-only tuple optimization</a> can't be used for this schema.</li><li>MyRocks uses the least CPU per statement (see cpupq, CPU per query). It is ~3X larger with Postgres and ~5X larger with InnoDB.</li><li>MyRocks has the best write efficiency (see wkbpi, KB written to storage per insert). It is ~9X larger for Postgres and ~80X larger for InnoDB.</li></ul><div><div>Explaining l.i2 - write-only, 5 rows/commit</div><div><ul><li>Data <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html#l.i2.metrics">is here</a></li><li>Results are similar to l.i1 above with one exception. The CPU overhead for Postgres was ~3X larger than MyRocks for l.i1 but here it is more than 20X larger because of <a href="https://www.google.com/search?q=site%3Asmalldatum.blogspot.com+get_actual_variable_range">the problem</a> with the optimizer spending too much time in get_actual_variable_range.</li></ul><div>Explaining range queries - qr100, qr500, qr1000</div></div></div></div><div><ul style="text-align: left;"><li>Data <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html#qr100.L1.metrics">is here</a></li><li>The read IO overhead is similar for Postgres and MyRocks (see rpq, read per query) while it is ~8X larger for InnoDB. A standard hand-waving analysis would predict that MyRocks wasn't going to be as read IO efficient as Postgres, but prefix bloom filters and less space amplification help it here.</li><li>Postgres has the smallest CPU overhead (see cpupq, CPU per query). It is ~2.6X larger for InnoDB and ~15X larger for MyRocks. I hope to explain why MyRocks uses so much more CPU.</li></ul><div>Explaining point queries - qp100, qp500, qp1000</div></div><div><ul style="text-align: left;"><li>Data <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/all.html#qp100.L2.metrics">is here</a></li><li>MyRocks has the best read IO efficiency (see rpq, read per query). It is ~2.2X and ~1.3X larger for Postgres and InnoDB. Bloom filters and better space amplification might explain this.</li><li>All DBMS have a similar CPU overhead (see cpupq, CPU per query).</li></ul><div><b>Results: throughput over time</b></div></div><div><br /></div><div>Explaining l.i0 - load in key order</div><div><ul style="text-align: left;"><li>Data is here for <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i0.html#pg162_def.cx9a2a_bee.ips">Postgres</a>, <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i0.html#my8036_rel.cz10a_bee.ips">InnoDB</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i0.html#fbmy8028_rel_221222.cza1_bee.ips">MyRocks</a></li><li>Results are stable for all DBMS but MyRocks has the most noise</li></ul><div><div>Explaining l.i1 - write-only, 50 rows/commit</div><div><ul><li>Data is here for <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i1.html#pg162_def.cx9a2a_bee.ips">Postgres</a>, <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i1.html#my8036_rel.cz10a_bee.ips">InnoDB</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i1.html#fbmy8028_rel_221222.cza1_bee.ips">MyRocks</a></li><li>The insert/s rate declines over time for Postgres which is expected but it grows over time for InnoDB from ~1000/s to ~3000/s. I assume that InnoDB initially suffers more from page splits and perf increases as that is less likely over time.</li></ul><div><div>Explaining l.i2 - write-only, 5 rows/commit</div><div><ul><li>Data is here for <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i2.html#pg162_def.cx9a2a_bee.ips">Postgres</a>, <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i2.html#my8036_rel.cz10a_bee.ips">InnoDB</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i2.html#fbmy8028_rel_221222.cza1_bee.ips">MyRocks</a></li><li>The insert/s and delete/s rate for Postgres decreases slightly over time. I assume the issue is that the optimizer CPU overhead for delete statements grows over time which is apparent on the chart for max delete response time (<a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.l.i2.html#pg162_def.cx9a2a_bee.ips">start here</a> and scroll down).</li></ul><div>Explaining range queries - qr100, qr500, qr1000</div></div></div></div><div><ul style="text-align: left;"><li>Data is here for <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html">qr100</a>, <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr500.L3.html">qr500</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html">qr1000</a></li><li>For qr100 measured at 1-second intervals</li><ul><li>For Postgres the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#pg162_def.cx9a2a_bee.qps">query rate</a> has noise (ranges between 8000/s and 12000/s, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#pg162_def.cx9a2a_bee.imax">max insert response time</a> is noisy and the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#pg162_def.cx9a2a_bee.dmax">max delete response time</a> is stable but grows over time.</li><li>For InnoDB the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#my8036_rel.cz10a_bee.qps">query rate</a> is stable, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#my8036_rel.cz10a_bee.imax">max insert response time</a> is noisy and the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#my8036_rel.cz10a_bee.dmax">max delete response time</a> is stable.</li><li>For MyRocks the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#fbmy8028_rel_221222.cza1_bee.qps">query rate</a> has noise (ranges between 500/s and 600/s), the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#fbmy8028_rel_221222.cza1_bee.imax">max insert response time</a> is stable and the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr100.L1.html#fbmy8028_rel_221222.cza1_bee.dmax">max delete response time</a> is stable.</li></ul><li>For qr1000 measured at 1-second intervals</li><ul><li>For Postgres the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#pg162_def.cx9a2a_bee.qps">query rate</a> has noise and declines over time, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#pg162_def.cx9a2a_bee.imax">max insert response time</a> is noisy and the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#pg162_def.cx9a2a_bee.dmax">max delete response time</a> is stable but grows over time.</li><li>For InnoDB the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#my8036_rel.cz10a_bee.qps">query rate</a> is stable, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#my8036_rel.cz10a_bee.imax">max insert response time</a> is noisy and the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#my8036_rel.cz10a_bee.dmax">max delete response time</a> is noisy.</li><li>For MyRocks the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#fbmy8028_rel_221222.cza1_bee.qps">query rate</a> is noisy, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#fbmy8028_rel_221222.cza1_bee.imax">max insert response time</a> is noisy and the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qr1000.L5.html#fbmy8028_rel_221222.cza1_bee.dmax">max delete response time</a> is noisy.</li></ul></ul></div><div><div>Explaining point queries - qp100, qp500, qp1000</div></div><div><ul style="text-align: left;"><li>Data is here for <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html">qp100</a>, <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp500.L4.html">qp500</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html">qp1000</a></li><li>For qp100 measured at 1-second intervals</li><ul><li>For Postgres the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#pg162_def.cx9a2a_bee.qps">query rate</a> has noise, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#pg162_def.cx9a2a_bee.imax">max insert response time</a> has noise, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#pg162_def.cx9a2a_bee.dmax">max delete response time</a> has noise and is growing</li><li>For InnoDB the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#my8036_rel.cz10a_bee.qps">query rate</a> is stable, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#my8036_rel.cz10a_bee.imax">max insert response time</a> has noise, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#my8036_rel.cz10a_bee.dmax">max delete response time</a> is stable</li><li>For MyRocks the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#fbmy8028_rel_221222.cza1_bee.qps">query rate</a> is stable, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#fbmy8028_rel_221222.cza1_bee.imax">max insert response time</a> has noise, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp100.L2.html#fbmy8028_rel_221222.cza1_bee.dmax">max delete response time</a> has noise</li></ul><li>For qp1000 measured at 1-second intervals</li><ul><li>For Postgres the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#pg162_def.cx9a2a_bee.qps">query rate</a> has noise, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#pg162_def.cx9a2a_bee.imax">max insert response time</a> has too much noise especially from 1200s to 1600s which explains why it failed to sustain the target insert and delete rates, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#pg162_def.cx9a2a_bee.dmax">max delete response time</a> is stable and growing.</li><li>For InnoDB the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#my8036_rel.cz10a_bee.qps">query rate</a> is stable, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#my8036_rel.cz10a_bee.imax">max insert response time</a> has too much noise, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#my8036_rel.cz10a_bee.dmax">max delete response time</a> has much noise</li><li>For MyRocks the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#fbmy8028_rel_221222.cza1_bee.qps">query rate</a> is stable, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#fbmy8028_rel_221222.cza1_bee.imax">max insert response time</a> has noise, the <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.some/tput.qp1000.L6.html#fbmy8028_rel_221222.cza1_bee.dmax">max delete response time</a> has noise</li></ul></ul><div><br /></div></div></div></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-60663765973636363432024-03-17T18:51:00.000-07:002024-03-18T08:45:25.562-07:00Yet another Insert Benchmark result: MyRocks, MySQL and a small server<p>While trying to explain a Postgres performance problem I repeated the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> on a small server for MyRocks from MySQL 5.6 and 8.0. This post explains those results. The previous report for a cached workload <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and_12.html">is here</a>.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Disclaimers</li><ul><li>The low-concurrency results here are worse than the results from a bigger server with more concurrency because the result here depends more on CPU overheads and MySQL keeps on growing code paths, while on the bigger server the cost from new CPU overheads is offset by other improvements.</li><li>Some of the regressions here are similar to what I measure for InnoDB and the problem is likely code above the storage engine layer.</li><li>For MyRocks 8.0.28 compared to 5.6.35</li><ul><li>Results for most benchmark steps aren't surprising and MyRocks 8.0.28 gets between 80% and 95% of the throughput compared to MyRocks 5.6.35</li><li>Results for the qr1000.L6 benchmark step with the IO-bound workload are odd. MyRocks 8.0.28 gets only 39% of the throughput compared to MyRocks 5.6.35. <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.fbmy/all.html#qr1000.L5.metrics">From the metrics</a> I see that MyRocks 8.0.28 does ~2X more read IO/query (see rpq) and uses ~2X more CPU/query (see cpupq). I have yet to explain this.</li></ul></ul></ul><p></p><div><b>Build + Configuration</b></div><div><br />This report has results for MyRocks 5.6.35 and 8.0.28. The cza1_bee config was used and they <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/arc/mar24.bee.pg">are here</a>.</div><div><br /></div><div>The builds tested are:</div><div><ul style="text-align: left;"><li>fbmy5635_rel_202203072101.cza1_bee</li><ul><li>MyRocks 5.6.35 from code as of 2022-03-07 at git hash e7d976ee with RocksDB 6.28.2, cza1_bee config</li></ul><li>fbmy5635_rel_20230529_850.cza1_bee</li><ul><li>MyRocks 5.6.35 from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.5.0, cza1_bee config</li></ul><li>fbmy8028_rel_20220829_752.cza1_bee</li><ul><li>MyRocks 8.0.28 from code as of 2022-08-29 at git hash a35c8dfeab, RocksDB 7.5.2, cza1_bee config</li></ul><li>fbmy8028_rel_20230619_831.cza1_bee</li><ul><li>MyRocks 8.0.28 from code as of 2023-06-19 at git hash 6164cf0274, RocksDB 8.3.1, cza1_bee config</li></ul><li>fbmy8028_rel_221222.cza1_bee</li><ul><li>MyRocks 8.0.28 from code as of 2023-12-22 at git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b, cza1_bee config</li></ul><li>fbmy8028_rel_231222_870.cza1_bee_cfx</li><ul><li>MyRocks 8.0.28 from code as of 2023-12-22 at git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b, cza1_bee config, indexes use a separate column family</li></ul></ul><div><div><b>The Benchmark</b></div><div><div><br /></div><div>The benchmark is run with 1 client. It is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> and was run in two setups</div><div><ul><li>cached - database has 30M rows and fits in memory</li><li>IO-bound - database has 800M rows and is larger than memory, </li></ul></div><div>The test server was named SER4 in the previous report. It has 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device.</div><div><br />The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. X is 30M for cached and 800M for IO-bound.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts Y rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate. Y is 80M for cached and 4M for IO-bound.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and Y is 20M for cached and 1M for IO-bound.</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for Z seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. Z is 3600 for cached and 1800 for IO-bound.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance reports are here <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.fbmy/all.html">for cached</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.fbmy/all.html">for IO-bound</a>.</div></div></div></div></div></div><div><br /></div><div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From the summary <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.fbmy/all.html#summary">for cached</a>:</div><div><ul style="text-align: left;"><li>the base case is fbmy5635_rel_202203072101 (MyRocks 5.6.35 from 2022)</li><li>comparing fbmy8028_rel_231222_870 (latest MyRocks 8.0.28) with the base case</li><ul><li>l.i0</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.72</span><span style="background-color: white;"> in </span>fbmy8028_rel_231222_870</li></ul><li>l.x - I ignore this for now</li><li>l.i1, l.i2</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.85</span>, <span style="background-color: #f4cccc;">0.82 </span><span style="background-color: white;">in </span>fbmy8028_rel_231222_870</li></ul><li>qr100, qr500, qr1000</li><ul><li>relative QPS is <span style="background-color: #d9ead3;">1.06</span>, <span style="background-color: #d9ead3;">1.24</span>, <span style="background-color: #d9ead3;">1.06</span> <span style="background-color: white;">in </span>fbmy8028_rel_231222_870</li></ul><li>qp100, qp500, qp1000</li><ul><li>relative QPS is <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #f4cccc;">0.95</span>, <span style="background-color: #eeeeee;">0.96</span> <span style="background-color: white;">in </span>fbmy8028_rel_231222_870</li></ul></ul></ul><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.fbmy/all.html#summary">for IO-bound</a>:</div><div><ul style="text-align: left;"><li>the base case is fbmy5635_rel_202203072101 (MyRocks 5.6.35 from 2022)</li><li>comparing fbmy8028_rel_231222_870 (latest MyRocks 8.0.28) with the base case</li><ul><li>l.i0</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.72</span><span style="background-color: white;"> in </span>fbmy8028_rel_231222_870</li></ul><li>l.x - I ignore this for now</li><li>l.i1, l.i2</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.86</span>, <span style="background-color: #f4cccc;">0.80 </span><span style="background-color: white;">in </span>fbmy8028_rel_231222_870</li></ul><li>qr100, qr500, qr1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.80</span>, <span style="background-color: #f4cccc;">0.80</span>, <span style="background-color: #f4cccc;">0.39</span> <span style="background-color: white;">in </span>fbmy8028_rel_231222_870</li><li>the 0.39 value is an outlier. <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.fbmy/all.html#qr1000.L5.metrics">From the metrics</a> I see that MyRocks 8.0.28 does ~2X more read IO/query (see rpq) and uses ~2X more CPU/query (see cpupq). I have yet to explain this.</li></ul><li>qp100, qp500, qp1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.94</span>, <span style="background-color: #f4cccc;">0.94</span>, <span style="background-color: #f4cccc;">0.94</span> <span style="background-color: white;">in </span>fbmy8028_rel_231222_870</li></ul></ul></ul></div></div></div><div><br /></div></div></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-56102891640153116512024-03-17T18:22:00.000-07:002024-03-18T08:45:33.068-07:00Yet another Insert Benchmark result: MySQL, InnoDB and a small server<p>While trying to explain a Postgres performance problem I repeated the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> on a small server for InnoDB from MySQL 5.6, 5.7 and 8.0. This post explains those results. Previous reports are here for <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-innodbmysql-56.html">cached</a> and <a href="https://smalldatum.blogspot.com/2024/02/updated-insert-benchmark-innodbmysql-56.html">IO-bound</a> workloads and the results here are similar.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Disclaimer - the low-concurrency results here are worse than the results from a bigger server with more concurrency because the result here depends more on CPU overheads and MySQL keeps on growing code paths, while on the bigger server the cost from new CPU overheads is offset by other improvements.</li><li>There are significant regressions from 5.6 to 5.7 and again from 5.7 to 8.0</li></ul><div><b>Build + Configuration</b></div><div><br />This report has results for InnoDB with MySQL 5.6.51, 5.7.44 and 8.0.36. The cz10a_bee config was used and they <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/arc/mar24.bee.pg">are here</a>.</div><p></p><div><b>The Benchmark</b></div><div><div><br /></div><div>The benchmark is run with 1 client. It is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> and was run in two setups</div><div><ul><li>cached - database has 30M rows and fits in memory</li><li>IO-bound - database has 800M rows and is larger than memory, </li></ul></div><div>The test server was named SER4 in the previous report. It has 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device.</div><div><br />The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. X is 30M for cached and 800M for IO-bound.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts Y rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate. Y is 80M for cached and 4M for IO-bound.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and Y is 20M for cached and 1M for IO-bound.</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for Z seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. Z is 3600 for cached and 1800 for IO-bound.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance reports are here <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.inno/all.html">for cached</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.inno/all.html">for IO-bound</a>.</div></div></div></div></div></div><div><br /></div><div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From the summary <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.inno/all.html#summary">for cached</a>:</div><div><ul style="text-align: left;"><li>the base case is MySQL 5.6.51</li><li>comparing 5.7.4 and 8.0.36 with 5.6.21 shows large regressions</li><ul><li>l.i0</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.84</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.57</span> in 8.0.36</li></ul><li>l.x - I ignore this for now</li><li>l.i1, l.i2</li><ul><li>relative QPS is <span style="background-color: #d9ead3;">1.11</span>, <span style="background-color: #f4cccc;">0.88</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.91</span>, <span style="background-color: #f4cccc;">0.73</span> in 8.0.36</li></ul><li>qr100, qr500, qr1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.73</span>, <span style="background-color: #f4cccc;">0.72</span>, <span style="background-color: #f4cccc;">0.74</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.63</span>, <span style="background-color: #f4cccc;">0.63</span>, <span style="background-color: #f4cccc;">0.63</span> in 8.0.36</li></ul><li>qp100, qp500, qp1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.83</span>, <span style="background-color: #f4cccc;">0.83</span>, <span style="background-color: #f4cccc;">0.82</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.63</span>, <span style="background-color: #f4cccc;">0.61</span>, <span style="background-color: #f4cccc;">0.62</span> in 8.0.36</li></ul></ul></ul></div><div>From the summary <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.inno/all.html#summary">for IO-bound</a>:</div></div></div><div><ul style="text-align: left;"><li>the base case is MySQL 5.6.51</li><li>comparing 5.7.4 and 8.0.36 with 5.6.21 shows large regressions</li><ul><li>l.i0</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.86</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.59</span> in 8.0.36</li></ul><li>l.x - I ignore this for now</li><li>l.i1, l.i2</li><ul><li>relative QPS is <span style="background-color: #d9ead3;">1.30</span>, <span style="background-color: #d9ead3;">1.26</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #d9ead3;">1.30</span>, <span style="background-color: #d9ead3;">1.16</span> in 8.0.36</li></ul><li>qr100, qr500, qr1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.76</span>, <span style="background-color: #f4cccc;">0.86</span>, <span style="background-color: #f4cccc;">0.94</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.70</span>, <span style="background-color: #f4cccc;">0.81</span>, <span style="background-color: #f4cccc;">0.89</span> in 8.0.36</li></ul><li>qp100, qp500, qp1000</li><ul><li>relative QPS is <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">0.99</span>, <span style="background-color: #eeeeee;">1.02</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.94</span>, <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">1.02</span> in 8.0.36</li></ul></ul></ul></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-91435903966843095612024-03-17T12:42:00.000-07:002024-03-18T11:43:03.110-07:00Trying to tune Postgres for the Insert Benchmark: small server<p>Last year I spent much time trying to tune the Postgres configs I use to improve results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a>. While this was a good education for me I wasn't able to get significant improvements. After writing about another perf problem with Postgres (optimizer spends too much time on DELETE statements in a special circumstance) I revisited the tuning but didn't make things significantly better.<br /><br />The results here are from Postgres 16.2 and a small server (8 CPU cores) with a low concurrency workload. Previous benchmark reports for Postgres on this setup are here for <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_24.html">cached</a> and <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_27.html">IO-bound</a> runs.</p><p>tl;dr</p><p></p><ul><li>I have yet to fix this problem via tuning</li></ul><div><b>The Problem</b></div><div><br />The performance problem is explained <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_10.html">here</a> and <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_27.html">here</a>. The issue is that the optimizer spends too much time on DELETE statements under special circumstances. In this case the optimizer can read from the index to determine the true value for the min or max value of the column referenced in the WHERE clause and when there are too many deleted index entries that have yet to be removed by vacuum then there is too much time spent in the optimizer.</div><div><br /></div><div>The problem shows up on the l.i2 benchmark step. The benchmark client sustains the same rate for inserts/s and delete/s so if deletes are too slow then the insert rate will also be too slow. The ratio of delete/s (and insert/s) for l.i2 relative to l.i1 is ~0.2 for the cached workload and ~0.05 for the IO-bound workload. <br /><br />The l.i1 benchmark step deletes more rows/statement so the optimizer overhead is more significant on the l.i2 step. The ratios are much larger for InnoDB and MyRocks (they have perf problems, just not this perf problem).</div><div><br /></div><div>The circumstances are:<br /></div><div><ul style="text-align: left;"><li>the table has a queue pattern (insert to one end, delete from the other)</li><li>the DELETE statements have <i>WHERE pk_col < $low-const and pk_col > $high-const</i> where $low-const and $high-const are integer constants and there is a PK on pk_col</li></ul><div>This workload creates much MVCC garbage that is co-located in the PK index and that is a much bigger problem for Postgres than for InnoDB or MyRocks. <br /><br />I hope for a Postgres storage engine that provides MVCC without vacuum. In theory, more frequent vacuum might help and the perf overhead from frequent vacuum might be OK for the heap table given the usage of visibility bits. But when vacuum then has to do a full index scan (no visibility bits there) then that is a huge cost which limits vacuum frequency.</div></div><p></p><p><b>Build + Configuration</b></p><div><div>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">previous report</a> for more details. I used Postgres 16.2.</div><div><br />The configuration files for the SER4 server are in subdirectories <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/arc/mar24.bee.pg">from here</a>. Using the suffixes that distinguish the config file names, they are::</div><div><ul style="text-align: left;"><li>cx9a2_bee - base config</li><li>cx9a2a_bee - adds autovacuum_vacuum_cost_delay= 1ms</li><li>cx9a2b_bee - adds autovacuum_vacuum_cost_delay= 0</li><li>cx9a2c_bee - adds autovacuum_naptime= 1s</li><li>cx9a2e_bee - adds autovacuum_vacuum_scale_factor= 0.01</li><li>cx9a2f_bee - adds autovacuum_vacuum_insert_scale_factor= 0.01</li><li>cx9a2g_bee - adds autovacuum_vacuum_cost_limit= 8000</li><li>cx9a2acef_bee - combines cx9a2a, cx9a2c, cz9a2e, cx9a2f configs</li><li>cx9a2bcef_bee - combines cx9a2b, cx9a2c, cz9a2e, cx9a2f configs</li></ul></div><div><b>The Benchmark</b></div><div><br /></div><div>The benchmark is run with 1 client. It is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> and was run in two setups</div><div><ul style="text-align: left;"><li>cached - database has 30M rows and fits in memory</li><li>IO-bound - database has 800M rows and is larger than memory, </li></ul></div><div>The test was run on two small servers that I have at home:</div><div><ul style="text-align: left;"><li>SER4 - Beelink SER4 with 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device</li><li>SER7 - Beelink SER7 with 8 cores, 32G RAM, Ubuntu 22.04 and XFS using 1 m.2 device. The CPU on the SER7 is a lot faster than the SER4.</li></ul></div><div>The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. For SER4, X is 30M for cached and 800M for IO-bound. For SER7, X is 60M for cached and 800M for IO-bound.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts Y rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate. Y is 80M for cached and 4M for IO-bound.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and Y is 20M for cached and 1M for IO-bound.</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for Z seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. Z is 3600 for cached and 1800 for IO-bound.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results: SER4 server</b></div><div><br /></div><div>The performance reports are here <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.pg/all.html#summary">for cached</a> and <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.pg/all.html#summary">for IO-bound</a>.</div><div><br /></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From the summaries for <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.mem.bee.pg/all.html#summary">cached</a> and for <a href="https://mdcallag.github.io/reports/24_03_17.1u.1tno.io.bee.pg/all.html#summary">IO-bound</a>:</div></div><div><ul style="text-align: left;"><li>The base case is uses the cx9a2_bee config</li><li>The different config files have no impact on performance for the l.i0 and l.x benchmark steps. They have a small impact for the qr* and qp* (read+write) benchmark steps. Because the impact is non-existent to small I ignore those to focus on l.i1 and l.i2.</li></ul><div>For l.i1 and l.i2 with a cached workload the different config files have some impact</div><ul style="text-align: left;"><li>The relative QPS, where Q means delete (and insert), ranges from 0.76 to 1.34 meaning a few made things slower and the best improved the delete/s rate by ~1.34X</li><li>The delete/s ratio for l.i2 vs l.i1 is 0.221 for the base case and the best improvement might be from the cx9a2f_bee config where the ratio increases to 0.265. But I was hoping to improve the ratio to 0.5 or larger so I was disappointed.</li></ul><div><div>For l.i1 and l.i2 with an IO-bound workload the different config files have no benefit</div><ul><li>Postgres 16.2 does ~2000 delete/s for the l.i1 step vs ~100/s for the l.i2 step</li></ul><div><div><b>Results: SER7 server</b></div><div><br /></div><div>The performance reports are here <a href="https://mdcallag.github.io/reports/24_03_18.1u.1tno.mem.ser7.pg/all.html">for cached</a> and <a href="https://mdcallag.github.io/reports/24_03_18.1u.1tno.io.ser7.pg/all.html">for IO-bound</a>. Results from the SER7 match results from the SER4 described above so I won't explain them.</div></div></div></div></div></div></div></div></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-76412388786064965992024-02-19T13:10:00.000-08:002024-02-20T08:13:06.952-08:00Perf regressions in Postgres from 9.0 to 16 with sysbench and a small server<p>This has results for sysbench vs Postgres on a small server. I have results for versions from 9.0 through 16. My <a href="https://smalldatum.blogspot.com/2023/09/perf-regressions-in-mysql-from-5621-to_18.html">last report</a> only went back to Postgres 11. The goal is to document where things get faster or slower over time for a low-concurrency and CPU-bound workload. The focus is on CPU regressions. </p><p>My results here aren't universal, but you have to start somewhere:</p><p></p><ul><li>The microbenchmarks here mostly measure CPU overheads</li><li>Things won't look the same with an IO-bound workload</li><li>Things won't look the same with a workload that has more concurrency </li><li>Things won't look the same with a workload that has complex queries</li></ul><div><b>Summaries</b></div><div><b><br /></b></div><div>Sections after this explain how the microbenchmark results are grouped.</div><div><br /></div><div><div>Comparing Postgres 16.2 with 9.0.23:<br /><ul><li>point query, part 1</li><ul><li>Postgres 16.2 is faster than 9.0.23 for all but one microbenchmark</li></ul><li>point query, part 2</li><ul><li>Postgres 16.2 is faster than 9.0.23 for all microbenchmarks</li></ul><li>range query, part 1 & part2</li><ul><li>About half of the microbenchmarks are ~20% slower in 16.2 vs 9.0.23</li><li>The big regression occurs between 9.0 and 9.1</li><li>For part 2 where aggregation is done the problem is worse for shorter range scans</li></ul><li>writes</li><ul><li>Postgres 16.2 is faster than 9.0.23 for all microbenchmarks</li></ul></ul><div><br /></div><div><div>Comparing Postgres 16.2 with 10.23</div><div><ul><li>Postgres 16.2 is faster than 10.23 for all microbenchmarks</li></ul><div><br /></div><div><div>Comparing Postgres 16.2 with 9.0.23</div><div><ul><li>point query, part 1</li><ul><li>Postgres 16.2 is at most 4% slower than 14.10</li></ul><li>point query, part 2</li><ul><li>Postgres 16.2 is at most 1% slower than 14.10</li></ul><li>range query, part 1</li><ul><li>Postgres 16.2 is at most 5% slower than 14.10</li></ul><li>range query, part 2</li><ul><li>Postgres 16.2 is as fast or faster than 14.10</li></ul><li>writes</li><ul><li>Postgres 16.2 is at most 1% slower than 14.10</li></ul></ul></div></div></div></div></div><div><b>Build + Configuration</b></div></div><div><div><br /></div><div>I used these versions: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 14.11, 15.5, 15.6, 16.1 and 16.2.<br /><br />The configuration files are in the subdirectories named pg9, pg10, pg11, pg12, pg13, pg14, pg15 and pg16 <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804">from here</a>. They are named <i>conf.diff.cx9a2_bee</i>.</div></div><div><br /></div><div><div><b>Benchmarks</b></div><div><br />I used sysbench and my usage is <a href="http://smalldatum.blogspot.com/2017/02/using-modern-sysbench-to-compare.html">explained here</a>. There are 42 microbenchmarks and each tests ~1 type of SQL statement and is run for 1200 seconds.</div><div><br /></div><div>Tests were run on a small server I have at home (<a href="http://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">see here</a>). The server is an SER4 from Beelink with 8 cores, 16G of RAM and 1 m.2 storage device with XFS and Ubuntu 22.04. The test tables are cached by Postgres.<br /><br />The benchmark is run with:<br /><ul><li>one connection</li><li>30M rows and a database cached by Postgres</li><li>each microbenchmark runs for 1200 seconds</li><li>prepared statements were enabled</li></ul></div><div>The command line was: <span style="font-family: courier;">bash r.sh 1 30000000 1200 1200 nvme0n1 1 1 1</span></div></div><div><span style="font-family: courier;"><br /></span></div><div><span><div style="font-family: Times;"><b>Results</b></div><div style="font-family: Times;"><br /></div><div style="font-family: Times;">For the results below I split the microbenchmarks into 5 groups -- 2 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. Unfortunately, I included the full scan microbenchmark (scan_range=100) in part 2 but it doesn't do aggregation. The spreadsheet with all data and charts <a href="https://docs.google.com/spreadsheets/d/12re8b7ZZJF8URErz1k7U85dg_hswocqNTm2MoOtG3MI/edit?usp=sharing">is here</a> and is easier to read.</div><div style="font-family: Times;"><br />All of the charts have relative throughput on the y-axis where that is (QPS for $me) / (QPS for $base), $me is a version (for example 5.7.20) and $base is the base version. The base version is specified below and one of 5.6.21, 5.7.10 and 8.0.13 depending on what I am comparing. The y-axis doesn't start at 0 to improve readability.</div><div style="font-family: Times;"><br /></div><div style="font-family: Times;">The legend on under the x-axis truncates the names I use for the microbenchmark and I don't know how to fix that other than <a href="https://docs.google.com/spreadsheets/d/12re8b7ZZJF8URErz1k7U85dg_hswocqNTm2MoOtG3MI/edit?usp=sharing">sharing the link</a> to the Google Sheet I used. File I used to create the spreadsheets <a href="https://github.com/mdcallag/mytools/tree/master/bench/arc/feb24.1u.bee.sb.pg.mem">are here</a>.</div><div style="font-family: Times;"><br /></div><div style="font-family: Times;"><b>Results: from 9.0 through 16.2</b></div><div style="font-family: Times;"><br /></div><div style="font-family: Times;">Summary:<br /><ul style="text-align: left;"><li>point query, part 1</li><ul><li>Postgres 16.2 is faster than 9.0.23 for all but one microbenchmark</li></ul><li>point query, part 2</li><ul><li>Postgres 16.2 is faster than 9.0.23 for all microbenchmarks</li></ul><li>range query, part 1 & part2</li><ul><li>About half of the microbenchmarks are ~20% slower in 16.2 vs 9.0.23</li><li>The big regression occurs between 9.0 and 9.1</li><li>For part 2 where aggregation is done the problem is worse for shorter range scans</li></ul><li>writes</li><ul><li>Postgres 16.2 is faster than 9.0.23 for all microbenchmarks</li></ul></ul><div>This table has summary statistics from Postgres 16.2 for each microbenchmark group. The numbers represent the relative QPS (relative to 9.0.23) and a value > 1 means that 16.2 is faster than 9.0.23.</div><div><br /></div><div><google-sheets-html-origin><table border="1" cellpadding="0" cellspacing="0" data-sheets-root="1" dir="ltr" style="border-collapse: collapse; border: none; font-family: Arial; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col></colgroup><tbody><tr style="height: 21px;"><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td data-sheets-value="{"1":2,"2":"min"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">min</td><td data-sheets-value="{"1":2,"2":"max"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">max</td><td data-sheets-value="{"1":2,"2":"avg"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">avg</td><td data-sheets-value="{"1":2,"2":"median"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">median</td><td data-sheets-value="{"1":2,"2":"stdev"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">stdev</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-1</td><td data-sheets-formula="=min(R[-49]C[12]:R[-39]C[12])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.92}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.92</td><td data-sheets-formula="=max(R[-49]C[11]:R[-39]C[11])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.32}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.32</td><td data-sheets-formula="=AVERAGE(R[-49]C[10]:R[-39]C[10])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.176363636363636}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.18</td><td data-sheets-formula="=MEDIAN(R[-49]C[9]:R[-39]C[9])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.2}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.20</td><td data-sheets-formula="=STDEV(R[-49]C[8]:R[-39]C[8])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.12060453783110553}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.12</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-2</td><td data-sheets-formula="=min(R[-38]C[12]:R[-33]C[12])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.07}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.07</td><td data-sheets-formula="=max(R[-38]C[11]:R[-33]C[11])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.18}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.18</td><td data-sheets-formula="=average(R[-38]C[10]:R[-33]C[10])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.1216666666666668}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.12</td><td data-sheets-formula="=median(R[-38]C[9]:R[-33]C[9])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.125}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.13</td><td data-sheets-formula="=stdev(R[-38]C[8]:R[-33]C[8])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.04167333280008524}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.04</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-1</td><td data-sheets-formula="=min(R[-32]C[12]:R[-25]C[12])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.77}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.77</td><td data-sheets-formula="=max(R[-32]C[11]:R[-25]C[11])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.68}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.68</td><td data-sheets-formula="=average(R[-32]C[10]:R[-25]C[10])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.0924999999999998}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.09</td><td data-sheets-formula="=median(R[-32]C[9]:R[-25]C[9])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.995}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=stdev(R[-32]C[8]:R[-25]C[8])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.3722422406367737}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.37</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-2</td><td data-sheets-formula="=min(R[-24]C[12]:R[-18]C[12])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.78}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.78</td><td data-sheets-formula="=max(R[-24]C[11]:R[-18]C[11])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.34}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.34</td><td data-sheets-formula="=average(R[-24]C[10]:R[-18]C[10])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.0114285714285713}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.01</td><td data-sheets-formula="=median(R[-24]C[9]:R[-18]C[9])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.85}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.85</td><td data-sheets-formula="=stdev(R[-24]C[8]:R[-18]C[8])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.24565752389784246}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.25</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"writes"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">writes</td><td data-sheets-formula="=min(R[-17]C[12]:R[-8]C[12])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.11}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.11</td><td data-sheets-formula="=max(R[-17]C[11]:R[-8]C[11])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":4.64}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4.64</td><td data-sheets-formula="=average(R[-17]C[10]:R[-8]C[10])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":2.213}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2.21</td><td data-sheets-formula="=median(R[-17]C[9]:R[-8]C[9])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.9649999999999999}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.97</td><td data-sheets-formula="=stdev(R[-17]C[8]:R[-8]C[8])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.1007275371821634}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.10</td></tr></tbody></table></google-sheets-html-origin></div></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx3xLLsdbHfZQpnLTNsYiG8r3iFFVE269MTSmhPyvwfzvM8iZifIcIc52AfIPQZCa83MFuW8oG-IUyp7OJuKLtEoZyBRYzQVIYl4I4I0ICYuDkbiyZrtj5e1GqkHSA0iy_FIPi1vPm8KK3EfhgZwE0QgcWf0U2xG93D7eF2ChnLH-QB2Q2HoL32o5UFySW/s600/Point%20query,%20part%201_%20QPS%20relative%20to%20PG%209.0.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx3xLLsdbHfZQpnLTNsYiG8r3iFFVE269MTSmhPyvwfzvM8iZifIcIc52AfIPQZCa83MFuW8oG-IUyp7OJuKLtEoZyBRYzQVIYl4I4I0ICYuDkbiyZrtj5e1GqkHSA0iy_FIPi1vPm8KK3EfhgZwE0QgcWf0U2xG93D7eF2ChnLH-QB2Q2HoL32o5UFySW/w640-h396/Point%20query,%20part%201_%20QPS%20relative%20to%20PG%209.0.23.png" width="640" /></a></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEit9YB4db8zzONIWKNdE4oh7jQmqofTGPExECkI1oJhzX0bhgGjSKxnPhJFle85d5VIbc_gPXj4iaorJK2b5MSZV5N2Q6fdB52OdQf4T-gCIYpDdRs969f6rTbT31VfSkV1GOiADABCEkwCiH4JRxSBGz8PqwR-yt27gl-nVsweMg8RS9oR5-vE5UsQWAiF/s600/Point%20query,%20part%202_%20QPS%20relative%20to%20PG%209.0.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEit9YB4db8zzONIWKNdE4oh7jQmqofTGPExECkI1oJhzX0bhgGjSKxnPhJFle85d5VIbc_gPXj4iaorJK2b5MSZV5N2Q6fdB52OdQf4T-gCIYpDdRs969f6rTbT31VfSkV1GOiADABCEkwCiH4JRxSBGz8PqwR-yt27gl-nVsweMg8RS9oR5-vE5UsQWAiF/w640-h396/Point%20query,%20part%202_%20QPS%20relative%20to%20PG%209.0.23.png" width="640" /></a></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglv6pkRQhSlzOCCkAlk-Rh6skYYAwDNitj2oLGughIFNg23TSstLjksZ81gjD6HEWl1rIe2st8Oay9080RGCnxtJcfxd_I1xdGmG14IOII9RQzV6Rs53huNegsAM-ds4rTRAaNKVaL16PxiJpHaNXh3EFdXEl9MrL_Lv13Mlp-uwF8ZlfdTnQT_y4ga44N/s600/Range%20query,%20part%201_%20QPS%20relative%20to%20PG%209.0.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglv6pkRQhSlzOCCkAlk-Rh6skYYAwDNitj2oLGughIFNg23TSstLjksZ81gjD6HEWl1rIe2st8Oay9080RGCnxtJcfxd_I1xdGmG14IOII9RQzV6Rs53huNegsAM-ds4rTRAaNKVaL16PxiJpHaNXh3EFdXEl9MrL_Lv13Mlp-uwF8ZlfdTnQT_y4ga44N/w640-h396/Range%20query,%20part%201_%20QPS%20relative%20to%20PG%209.0.23.png" width="640" /></a></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE55AARbRksCszqpxvHMhoZjWerNgTJ6YUPA9gXRUgA2dGGCwRwMEidV7aKtLZ4RCd6f7ZuwPoNVufZWQhCuPuwD_mkDmDnS43bYWtMj_IglkK-yHLguZAqslKWJndz7Zi_7dTt_mjAIm0HAJbla4EhsmvzjefoHDlmmKZgx7uM3dRecgqnwhFdvCgoIaH/s600/Range%20query,%20part%202_%20QPS%20relative%20to%20PG%209.0.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiE55AARbRksCszqpxvHMhoZjWerNgTJ6YUPA9gXRUgA2dGGCwRwMEidV7aKtLZ4RCd6f7ZuwPoNVufZWQhCuPuwD_mkDmDnS43bYWtMj_IglkK-yHLguZAqslKWJndz7Zi_7dTt_mjAIm0HAJbla4EhsmvzjefoHDlmmKZgx7uM3dRecgqnwhFdvCgoIaH/w640-h396/Range%20query,%20part%202_%20QPS%20relative%20to%20PG%209.0.23.png" width="640" /></a></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjllhBRTPr5MZKGV3wOpRXaCjJJyjeiGL01vglMeTXqosfhAnE4u3110AeNuWj0pVnlN8hHr3zDF7P5o6K1xt6kgOJWEY_e0kwsL54QUvBoZI05iBwU_Kw2YEbShR8qEkSWiKOK5HDCaNQK1GdrhIqTJ8bWo6t0vCF4XLM5zY-P-eCtuz5b05FaPO8o0WQb/s600/Writes_%20QPS%20relative%20to%20PG%209.0.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjllhBRTPr5MZKGV3wOpRXaCjJJyjeiGL01vglMeTXqosfhAnE4u3110AeNuWj0pVnlN8hHr3zDF7P5o6K1xt6kgOJWEY_e0kwsL54QUvBoZI05iBwU_Kw2YEbShR8qEkSWiKOK5HDCaNQK1GdrhIqTJ8bWo6t0vCF4XLM5zY-P-eCtuz5b05FaPO8o0WQb/w640-h396/Writes_%20QPS%20relative%20to%20PG%209.0.23.png" width="640" /></a></div><div style="font-family: Times;"><b>Results: from 10.23 through 16.2</b></div><div style="font-family: Times;"><br /></div><div style="font-family: Times;">Summary</div><div style="font-family: Times;"><ul style="text-align: left;"><li>Postgres 16.2 is faster than 10.23 for all microbenchmarks</li></ul><div>This table has summary statistics from Postgres 16.2 for each microbenchmark group. The numbers represent the relative QPS (relative to 10.23) and a value > 1 means that 16.2 is faster than 10.23.</div><div><br /></div><div><google-sheets-html-origin><table border="1" cellpadding="0" cellspacing="0" data-sheets-root="1" dir="ltr" style="border-collapse: collapse; border: none; font-family: Arial; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col></colgroup><tbody><tr style="height: 21px;"><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td data-sheets-value="{"1":2,"2":"min"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">min</td><td data-sheets-value="{"1":2,"2":"max"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">max</td><td data-sheets-value="{"1":2,"2":"avg"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">avg</td><td data-sheets-value="{"1":2,"2":"median"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">median</td><td data-sheets-value="{"1":2,"2":"stdev"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">stdev</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-1</td><td data-sheets-formula="=min(R[-49]C[5]:R[-39]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.02}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.02</td><td data-sheets-formula="=max(R[-49]C[4]:R[-39]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.11}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.11</td><td data-sheets-formula="=AVERAGE(R[-49]C[3]:R[-39]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.070909090909091}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.07</td><td data-sheets-formula="=MEDIAN(R[-49]C[2]:R[-39]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.08}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.08</td><td data-sheets-formula="=STDEV(R[-49]C[1]:R[-39]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.02844452335847638}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.03</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-2</td><td data-sheets-formula="=min(R[-38]C[5]:R[-33]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.04}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.04</td><td data-sheets-formula="=max(R[-38]C[4]:R[-33]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.08}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.08</td><td data-sheets-formula="=average(R[-38]C[3]:R[-33]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.0583333333333333}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.06</td><td data-sheets-formula="=median(R[-38]C[2]:R[-33]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.06}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.06</td><td data-sheets-formula="=stdev(R[-38]C[1]:R[-33]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.0132916013582513}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.01</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-1</td><td data-sheets-formula="=min(R[-32]C[5]:R[-25]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.07}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.07</td><td data-sheets-formula="=max(R[-32]C[4]:R[-25]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.13}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.13</td><td data-sheets-formula="=average(R[-32]C[3]:R[-25]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.10125}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.10</td><td data-sheets-formula="=median(R[-32]C[2]:R[-25]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.10</td><td data-sheets-formula="=stdev(R[-32]C[1]:R[-25]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.024164614034338935}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.02</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-2</td><td data-sheets-formula="=min(R[-24]C[5]:R[-18]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.04}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.04</td><td data-sheets-formula="=max(R[-24]C[4]:R[-18]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.09}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.09</td><td data-sheets-formula="=average(R[-24]C[3]:R[-18]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.0571428571428572}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.06</td><td data-sheets-formula="=median(R[-24]C[2]:R[-18]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.05}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.05</td><td data-sheets-formula="=stdev(R[-24]C[1]:R[-18]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.019760470401187107}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.02</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"writes"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">writes</td><td data-sheets-formula="=min(R[-17]C[5]:R[-8]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.02}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.02</td><td data-sheets-formula="=max(R[-17]C[4]:R[-8]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.15}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.15</td><td data-sheets-formula="=average(R[-17]C[3]:R[-8]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.077}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.08</td><td data-sheets-formula="=median(R[-17]C[2]:R[-8]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.06}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.06</td><td data-sheets-formula="=stdev(R[-17]C[1]:R[-8]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.04448470398787527}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.04</td></tr></tbody></table></google-sheets-html-origin></div></div><div style="font-family: Times;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu360iae-DX82eB4RhUzZi8nsgVrEeRIzwRRzfze-JW-LT3hm_6yWED71MsArMMLSoox5ylxlZVRxNxso9VomhbO-W7mMm9fvTccTHhUSqrZ3ZVGMM4FK2A1OgJS-6s6hLHTpbVu-Z9PeX6k62bUCbT1Wegm3kLwEjZgRCykDLZWsMvhyphenhyphen2keSTQxkDGiTy/s600/Point%20query,%20part%201_%20QPS%20relative%20to%20PG%2010.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhu360iae-DX82eB4RhUzZi8nsgVrEeRIzwRRzfze-JW-LT3hm_6yWED71MsArMMLSoox5ylxlZVRxNxso9VomhbO-W7mMm9fvTccTHhUSqrZ3ZVGMM4FK2A1OgJS-6s6hLHTpbVu-Z9PeX6k62bUCbT1Wegm3kLwEjZgRCykDLZWsMvhyphenhyphen2keSTQxkDGiTy/w640-h396/Point%20query,%20part%201_%20QPS%20relative%20to%20PG%2010.23.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDgTQqMQMcnFyEdgMyw5FFLepxOyrFthmHe5jVL8mbNtUEWirFfja0PyAdVKtv0puoks_mwcXU4x2a024WYG5TJTVS6zcmZW2M_tO9cICS9wSNl6Ni1TOn40oqW-3BQbwrMqEPE0JqAstL1acbKraExaxVj_zQotTBfO0Zu4vmmhwIdlEH9ado35M4ajvF/s600/Point%20query,%20part%202_%20QPS%20relative%20to%20PG%2010.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDgTQqMQMcnFyEdgMyw5FFLepxOyrFthmHe5jVL8mbNtUEWirFfja0PyAdVKtv0puoks_mwcXU4x2a024WYG5TJTVS6zcmZW2M_tO9cICS9wSNl6Ni1TOn40oqW-3BQbwrMqEPE0JqAstL1acbKraExaxVj_zQotTBfO0Zu4vmmhwIdlEH9ado35M4ajvF/w640-h396/Point%20query,%20part%202_%20QPS%20relative%20to%20PG%2010.23.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4IeE1BUMMXGKga6y2icv32bh2SS3kSzGxq2yold9b2MMRzwehDs8gdQhoxHKHFFNnxrNEGf4thIhIVzsxMJ-0n5J_K-k1v7zstqIV18QPmB0sevUHTpBIVubpKl_tlppZiIbY5W58PdPuT1ht5qTy4p1M8TffW2guscN2edTKv79db0Nf-kq_CY2PCRIU/s600/Range%20query,%20part%201_%20QPS%20relative%20to%20PG%2010.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4IeE1BUMMXGKga6y2icv32bh2SS3kSzGxq2yold9b2MMRzwehDs8gdQhoxHKHFFNnxrNEGf4thIhIVzsxMJ-0n5J_K-k1v7zstqIV18QPmB0sevUHTpBIVubpKl_tlppZiIbY5W58PdPuT1ht5qTy4p1M8TffW2guscN2edTKv79db0Nf-kq_CY2PCRIU/w640-h396/Range%20query,%20part%201_%20QPS%20relative%20to%20PG%2010.23.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCRwkGN6yLExSc5c7F-t_lD4xuyyr1EHs91LE8YRoSvEilvHM4tFTqFO9Jlu0heeloAr1nIm_Jb4RDj7X5Ac47QKaM-OMjWtg1q1cwe2b1_1TyJ2rjPwuHQTs-P0PILHUxc0QxWUIWfTeF3Sbcy6F-lJ8WwPO0fYvDTphyphenhyphenUAZ89TY_3iZzwujbfnvfTPWf/s600/Range%20query,%20part%202_%20QPS%20relative%20to%20PG%2010.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCRwkGN6yLExSc5c7F-t_lD4xuyyr1EHs91LE8YRoSvEilvHM4tFTqFO9Jlu0heeloAr1nIm_Jb4RDj7X5Ac47QKaM-OMjWtg1q1cwe2b1_1TyJ2rjPwuHQTs-P0PILHUxc0QxWUIWfTeF3Sbcy6F-lJ8WwPO0fYvDTphyphenhyphenUAZ89TY_3iZzwujbfnvfTPWf/w640-h396/Range%20query,%20part%202_%20QPS%20relative%20to%20PG%2010.23.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMei5_2jGMsv45eQ2UhL0UixSeKr8TvlVwscNuteQ-ixVEvKa8AG7PL88-A78KSXAhGiRaDAcxerKifQLfpF2uEXYsvPUHtErPeUiuQzBRgj5QtVXBHjYgU9oFnwP91dJMZpgoJTPDUNVRX59deJlNA65IirA-DL8D2yRL3DiXgMR_5MwbBukXSjHrn7SU/s600/Writes_%20QPS%20relative%20to%20PG%2010.23.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMei5_2jGMsv45eQ2UhL0UixSeKr8TvlVwscNuteQ-ixVEvKa8AG7PL88-A78KSXAhGiRaDAcxerKifQLfpF2uEXYsvPUHtErPeUiuQzBRgj5QtVXBHjYgU9oFnwP91dJMZpgoJTPDUNVRX59deJlNA65IirA-DL8D2yRL3DiXgMR_5MwbBukXSjHrn7SU/w640-h396/Writes_%20QPS%20relative%20to%20PG%2010.23.png" width="640" /></a></div></div><div style="font-family: Times;"><br /></div><div style="font-family: Times;"><b>Results: 14.10, 14.11, 15.5, 15.6, 16.1, 16.2</b></div><div style="font-family: Times;"><br /></div><div style="font-family: Times;">Summary</div><div style="font-family: Times;"><ul style="text-align: left;"><li>point query, part 1</li><ul><li>Postgres 16.2 is at most 4% slower than 14.10</li></ul><li>point query, part 2</li><ul><li>Postgres 16.2 is at most 1% slower than 14.10</li></ul><li>range query, part 1</li><ul><li>Postgres 16.2 is at most 5% slower than 14.10</li></ul><li>range query, part 2</li><ul><li>Postgres 16.2 is as fast or faster than 14.10</li></ul><li>writes</li><ul><li>Postgres 16.2 is at most 1% slower than 14.10</li></ul></ul><div>This table has summary statistics from Postgres 16.2 for each microbenchmark group. The numbers represent the relative QPS (relative to 14.10) and a value > 1 means that 16.2 is faster than 14.10.<br /><br /></div><div><google-sheets-html-origin><table border="1" cellpadding="0" cellspacing="0" data-sheets-root="1" dir="ltr" style="border-collapse: collapse; border: none; font-family: Arial; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col></colgroup><tbody><tr style="height: 21px;"><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td data-sheets-value="{"1":2,"2":"min"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">min</td><td data-sheets-value="{"1":2,"2":"max"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">max</td><td data-sheets-value="{"1":2,"2":"avg"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">avg</td><td data-sheets-value="{"1":2,"2":"median"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">median</td><td data-sheets-value="{"1":2,"2":"stdev"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">stdev</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-1</td><td data-sheets-formula="=min(R[-49]C[4]:R[-39]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.96}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.96</td><td data-sheets-formula="=max(R[-49]C[3]:R[-39]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.07}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.07</td><td data-sheets-formula="=AVERAGE(R[-49]C[2]:R[-39]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=MEDIAN(R[-49]C[1]:R[-39]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=STDEV(R[-49]C[0]:R[-39]C[0])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.026076809620810625}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.03</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-2</td><td data-sheets-formula="=min(R[-38]C[4]:R[-33]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.99}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.99</td><td data-sheets-formula="=max(R[-38]C[3]:R[-33]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=average(R[-38]C[2]:R[-33]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.9966666666666667}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=median(R[-38]C[1]:R[-33]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=stdev(R[-38]C[0]:R[-33]C[0])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.005163977794943224}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.01</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-1</td><td data-sheets-formula="=min(R[-32]C[4]:R[-25]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.95}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.95</td><td data-sheets-formula="=max(R[-32]C[3]:R[-25]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=average(R[-32]C[2]:R[-25]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.9800000000000001}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.98</td><td data-sheets-formula="=median(R[-32]C[1]:R[-25]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.99}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.99</td><td data-sheets-formula="=stdev(R[-32]C[0]:R[-25]C[0])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.02070196678027065}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.02</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-2</td><td data-sheets-formula="=min(R[-24]C[4]:R[-18]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=max(R[-24]C[3]:R[-18]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.07}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.07</td><td data-sheets-formula="=average(R[-24]C[2]:R[-18]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.0185714285714287}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.02</td><td data-sheets-formula="=median(R[-24]C[1]:R[-18]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.00</td><td data-sheets-formula="=stdev(R[-24]C[0]:R[-18]C[0])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.03184785258515435}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.03</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"writes"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">writes</td><td data-sheets-formula="=min(R[-17]C[4]:R[-8]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.99}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.99</td><td data-sheets-formula="=max(R[-17]C[3]:R[-8]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.04}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.04</td><td data-sheets-formula="=average(R[-17]C[2]:R[-8]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.013}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.01</td><td data-sheets-formula="=median(R[-17]C[1]:R[-8]C[1])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.0150000000000001}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.02</td><td data-sheets-formula="=stdev(R[-17]C[0]:R[-8]C[0])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.014181364924121788}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.01</td></tr></tbody></table></google-sheets-html-origin></div></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><br /></div><div class="separator" style="clear: both; text-align: left;"><br /></div><div class="separator" style="clear: both; text-align: left;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipz0XvtjwRYEZ9curT5BwB15cTfmgid_RTxPEs9dI2mHwIdzwhTdY1fd6aXNuoN88tEzKUJKt6omjtBdzdqjotSp2kayWgmT5HT9nAwBpibNvfF3N1T4bJZekK7UL72gj_hLGWZAxHa5eKU8kKOesZkbYDiPPRJMKZdDqcQwBK3XUgztM1XFV6Dq9qkLt7/s600/Point%20query,%20part%201_%20QPS%20relative%20to%20PG%2014.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEipz0XvtjwRYEZ9curT5BwB15cTfmgid_RTxPEs9dI2mHwIdzwhTdY1fd6aXNuoN88tEzKUJKt6omjtBdzdqjotSp2kayWgmT5HT9nAwBpibNvfF3N1T4bJZekK7UL72gj_hLGWZAxHa5eKU8kKOesZkbYDiPPRJMKZdDqcQwBK3XUgztM1XFV6Dq9qkLt7/w640-h396/Point%20query,%20part%201_%20QPS%20relative%20to%20PG%2014.10.png" width="640" /></a></div></div><div class="separator" style="clear: both; text-align: left;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpFTNBpZk3D-8FxEw9w32PaZ3I7Y8nZAjp_vm_Ooboo3NKOH3iyFac_RBorU3KioCVkf41EdWQOjSbGS6AsW0D8EnO2rfM3JhkoI2IXhjhpNaYIDjCgVdWw9HL_KNU-Ac6rJWc1kgx_yvsB-ERP1tJHKcG-o-EeKEiWngO2FxeGhGURahgga6lAqi-pmiz/s600/Point%20query,%20part%202_%20QPS%20relative%20to%20PG%2014.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpFTNBpZk3D-8FxEw9w32PaZ3I7Y8nZAjp_vm_Ooboo3NKOH3iyFac_RBorU3KioCVkf41EdWQOjSbGS6AsW0D8EnO2rfM3JhkoI2IXhjhpNaYIDjCgVdWw9HL_KNU-Ac6rJWc1kgx_yvsB-ERP1tJHKcG-o-EeKEiWngO2FxeGhGURahgga6lAqi-pmiz/w640-h396/Point%20query,%20part%202_%20QPS%20relative%20to%20PG%2014.10.png" width="640" /></a></div></div><div class="separator" style="clear: both; text-align: left;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiiVJjl4FYIDW7d60aHjhpQ1BC63MKrA_qWZFvUNjgryTIJYKJE4XA9j2YGiarmG5OwrLr89pBEHQQovkxtR9dCJ38pVMCfz1erl9tJJ5uDAwiixsKZAQoUOiinFoevCDR-h5kLwJDbI5EMWndrDjTcELKBqyBtjUcHSdLyEvVeQ4I0vQM8nOsFJz8S-zX/s600/Range%20query,%20part%201_%20QPS%20relative%20to%20PG%2014.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiiVJjl4FYIDW7d60aHjhpQ1BC63MKrA_qWZFvUNjgryTIJYKJE4XA9j2YGiarmG5OwrLr89pBEHQQovkxtR9dCJ38pVMCfz1erl9tJJ5uDAwiixsKZAQoUOiinFoevCDR-h5kLwJDbI5EMWndrDjTcELKBqyBtjUcHSdLyEvVeQ4I0vQM8nOsFJz8S-zX/w640-h396/Range%20query,%20part%201_%20QPS%20relative%20to%20PG%2014.10.png" width="640" /></a></div></div><div class="separator" style="clear: both; text-align: left;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2ELvMD5GAqDVa9XEJHHWtLIDiN_0z0T2_YakT6QL4UbM8wvIDedIaknDrKV7vtasXd5f2Jx5RBSy_K4VCG7a5YOfq9Drr8qwxKEOCQT_si9ewgwHP9YbSMY0QTyuWFsSXDdAq9cxt6Gyhdxp_xvhrMsYxgEisms_Fc4mLX3oDFA2dCdmowVSZul8xll_P/s600/Range%20query,%20part%202_%20QPS%20relative%20to%20PG%2014.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2ELvMD5GAqDVa9XEJHHWtLIDiN_0z0T2_YakT6QL4UbM8wvIDedIaknDrKV7vtasXd5f2Jx5RBSy_K4VCG7a5YOfq9Drr8qwxKEOCQT_si9ewgwHP9YbSMY0QTyuWFsSXDdAq9cxt6Gyhdxp_xvhrMsYxgEisms_Fc4mLX3oDFA2dCdmowVSZul8xll_P/w640-h396/Range%20query,%20part%202_%20QPS%20relative%20to%20PG%2014.10.png" width="640" /></a></div></div><div class="separator" style="clear: both; text-align: left;"><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5y9kAm-nt-anVMh0sRyPvpwb-_CYCaGTRADQFSIS_Z3Un5ooWLGqXSMbkN3RO6CoVF_KpnPf8U0sxfpT3vAASvNDxlv20goJMkG2hHTnyIBJP0RC0FFribx2zyL9nAEeEq1I1VIMpqjTrTh33g6bWJHFquVETiABkMfdWmH2sPBBKtEn9C04D_B-Mdzf-/s600/Writes_%20QPS%20relative%20to%20PG%2014.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5y9kAm-nt-anVMh0sRyPvpwb-_CYCaGTRADQFSIS_Z3Un5ooWLGqXSMbkN3RO6CoVF_KpnPf8U0sxfpT3vAASvNDxlv20goJMkG2hHTnyIBJP0RC0FFribx2zyL9nAEeEq1I1VIMpqjTrTh33g6bWJHFquVETiABkMfdWmH2sPBBKtEn9C04D_B-Mdzf-/w640-h396/Writes_%20QPS%20relative%20to%20PG%2014.10.png" width="640" /></a></div></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><br /></div><div class="separator" style="clear: both; font-family: courier; text-align: center;"><br /></div></span><div class="separator" style="clear: both; text-align: center;"><br /></div><br /></div><div class="separator" style="clear: both; text-align: center;"><br /></div><br />Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com2tag:blogger.com,1999:blog-9149523927864751087.post-21472518584142610882024-02-16T19:52:00.000-08:002024-02-19T12:37:11.955-08:00Perf regressions in MySQL from 5.6.21 to 8.0.36 using sysbench and a small server<p>This has results for sysbench vs upstream MySQL on a small server. I have results for some 5.6, 5.7 and 8.0 releases up to 8.0.36. My <a href="https://smalldatum.blogspot.com/2023/09/perf-regressions-in-mysql-from-5621-to_18.html">last report</a> stopped at 8.0.34. The goal is to document where things get faster or slower over time for a low-concurrency and CPU-bound workload. The focus is on CPU regressions. </p><p>My results here aren't universal. </p><p></p><ul><li>The microbenchmarks here mostly measure CPU overheads</li><li>Things won't look the same with an IO-bound workload. If nothing else that will make many of the CPU regressions less significant.</li><li>Things won't look the same with a workload that has more concurrency. While MySQL tends to get slower over time from more CPU overhead it also gets faster over time on concurrent workloads from improvements to synchronization code. Results from a few months ago on a larger server <a href="http://smalldatum.blogspot.com/2023/04/perf-regressions-in-mysqlinnodb-big.html">are here</a> and the regressions are much smaller.</li><li>Things won't look the same with a workload that has complex queries. Most of the queries used by sysbench are simple and short running. This amplifies the impact of perf regressions in parse, semantic analysis and query optimization. </li></ul><p></p><p>tl;dr</p><p></p><ul><li>Upstream MySQL would benefit from changepoint detection as provided <a href="https://nyrk.io/">by Nyrkiö</a>.</li><li>MySQL 8.0 is the worst for perf regressions, while 5.7 and 5.6 are better at avoiding them. Also, there tend to be large regressions between the last point release in one major version and the first point release in the following major version, for instance from 5.6.51 to 5.7.10.</li><li>The scan_range=100 microbenchmark that does a full table scan has a large regression from 8.0.28 to 8.0.36 and <a href="https://bugs.mysql.com/bug.php?id=111538">bug 111538</a> is open for this</li></ul><div>Comparing 8.0.36 with 5.6.21<br /><ul><li>For point queries, 8.0.36 gets 19% to 39% less QPS than 5.6.21</li><li>For range queries that don't do aggregation (part 1), 8.0.36 gets 29% to 39% less QPS than 5.6.21</li><li>For range queries that do aggregation, 8.0.36 gets 3% to 45% less QPS than 5.6.21. The difference depends on the length of the range scan, where shorter scan == larger regression.</li><li>Full scan (scan_range=100) has the largest regression (5.6.21 is ~2X faster than 8.0.36)</li><li>For most writes (ignoring the <i>update-index</i> microbenchmark), 8.0.36 gets about half of the throughput compared to 5.6.21</li></ul><div><b>Builds</b></div></div><div><div><br /></div><div>It isn't easy to build older code on newer systems, compilers, etc. Notes on that are here <a href="https://twitter.com/MarkCallaghanDB/status/1700551813861449859">for 5.6</a>, <a href="http://smalldatum.blogspot.com/2022/11/compiling-mysql-56-57-on-ubuntu-2204.html">for 5.6 and 5.7</a>, <a href="http://smalldatum.blogspot.com/2023/08/compiling-all-mysql-57-versions-on.html">for 5.7</a> and <a href="http://smalldatum.blogspot.com/2023/05/compiling-all-releases-of-mysql-80.html">for 8.0</a>. A note on using cmake is <a href="http://smalldatum.blogspot.com/2023/02/cmakebuildtype-relwithdebinfo-vs.html">here</a>. The <i>rel</i> builds were used -- everything was compiled using CMAKE_BUILD_TYPE=Release.</div><div><br /></div><div>Tests were done for:</div><div><ul><li>5.6 - 5.6.21, 5.6.31, 5.6.41, 5.6.51</li><li>5.7 - 5.7.10, 5.7.20, 5.7.30, 5.7.44</li><li>8.0 - 8.0.13, 8.0.14, 8.0.20, 8.0.28, 8.0.35, 8.0.36</li></ul><div>I used the cz10a_bee config and it is here for <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/my56/my.cnf.cz10a_bee">5.6</a>, <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/my57/my.cnf.cz10a_bee.8.to.44">5.7</a> and 8.0 (<a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/my80/etc/my.cnf.cz10a_bee.11.to.18">here</a> and <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/my80/etc/my.cnf.cz10a_bee.19.to.34">here</a>). For 8.0 releases older than 8.0.19 I changed innodb_idle_flush_pct=1 to loose_innodb_idle_flush_pct=1.</div><div><br /></div></div><div><b>Benchmarks</b></div><div><br />I used sysbench and my usage is <a href="http://smalldatum.blogspot.com/2017/02/using-modern-sysbench-to-compare.html">explained here</a>. There are 42 microbenchmarks and each tests ~1 type of SQL statement and is run for 1200 seconds.</div><div><br /></div><div>Tests were run on a small server I have at home (<a href="http://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">see here</a>). The server is an SER4 from Beelink with 8 cores, 16G of RAM and 1 m.2 storage device with XFS and Ubuntu 22.04. The test tables are cached by InnoDB.<br /><br />The benchmark is run with:<br /><ul style="text-align: left;"><li>one connection</li><li>30M rows and a database cached by InnoDB</li><li>each microbenchmark runs for 1200 seconds</li><li>prepared statements were enabled</li></ul></div><div>The command line was: <span style="font-family: courier;">bash r.sh 1 30000000 1200 1200 nvme0n1 1 1 1</span></div><div><br /></div><div><b>Results</b></div><div><br /></div><div>For the results below I split the microbenchmarks into 5 groups -- 2 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. Unfortunately, I included the full scan microbenchmark (scan_range=100) in part 2 but it doesn't do aggregation. The spreadsheet with all data and charts <a href="https://docs.google.com/spreadsheets/d/1N4XNuoScXElivMeNiSGvcmY3cM0Y6MDfKgD9Vdv8lmY/edit?usp=sharing">is here</a> and is easier to read.</div><div><br />All of the charts have relative throughput on the y-axis where that is (QPS for $me) / (QPS for $base), $me is a version (for example 5.7.20) and $base is the base version. The base version is specified below and one of 5.6.21, 5.7.10 and 8.0.13 depending on what I am comparing. The y-axis doesn't start at 0 to improve readability.</div><div><br /></div><div>The legend on under the x-axis truncates the names I use for the microbenchmarks and I don't know how to fix that other than <a href="https://docs.google.com/spreadsheets/d/1N4XNuoScXElivMeNiSGvcmY3cM0Y6MDfKgD9Vdv8lmY/edit?usp=sharing">sharing the link</a> to the Google Sheet I used. Files I used to create the spreadsheets <a href="https://github.com/mdcallag/mytools/tree/master/bench/arc/feb24.bee.sysbench.my">are here</a>.</div><p></p><p><b>From 5.6.21 to 8.0.36</b></p><p>This section uses 5.6.21 as the base version and then compares that with 5.6.51, 5.7.10, 5.7.44, 8.0.13, 8.0.14, 8.0.20, 8.0.28, 8.0.35 and 8.0.36 to show how performance has changed from oldest tested (5.6.21) to newest tested (8.0.36).</p><p></p><ul><li>The largest regressions might occur between the last point release in one major version and the first point release in the next major version.</li><li>For point queries, 8.0.36 gets 19% to 39% less QPS vs 5.6.21</li><li>For range queries that don't do aggregation (part 1), 8.0.36 gets 29% to 39% less QPS vs 5.6.21</li><li>For range queries that do aggregation, 8.0.36 gets 3% to 45% less QPS vs 5.6.21. The difference depends on the length of the range scan -- shorter scan == larger regression. And full scan (scan_range=100) has the largest regression.</li><li>For most writes (ignoring the <i>update-index</i> microbenchmark), 8.0.36 gets about half of the throughput compared to 5.6.21</li></ul><div>Summary statistics for each of the benchmark groupings:</div><div><br /><google-sheets-html-origin><table border="1" cellpadding="0" cellspacing="0" data-sheets-root="1" dir="ltr" style="border-collapse: collapse; border: none; font-family: Arial; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col></colgroup><tbody><tr style="height: 21px;"><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td data-sheets-value="{"1":2,"2":"min"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">min</td><td data-sheets-value="{"1":2,"2":"max"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">max</td><td data-sheets-value="{"1":2,"2":"avg"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">avg</td><td data-sheets-value="{"1":2,"2":"median"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">median</td><td data-sheets-value="{"1":2,"2":"stdev"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">stdev</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-1</td><td data-sheets-formula="=min(R[-49]C[6]:R[-39]C[6])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.63}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.63</td><td data-sheets-formula="=max(R[-49]C[5]:R[-39]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.78}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.78</td><td data-sheets-formula="=average(R[-49]C[4]:R[-39]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.7090909090909091}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.71</td><td data-sheets-formula="=median(R[-49]C[3]:R[-39]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.69}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.69</td><td data-sheets-formula="=stdev(R[-49]C[2]:R[-39]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.04504543161177291}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.05</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"point-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">point-2</td><td data-sheets-formula="=min(R[-38]C[6]:R[-33]C[6])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.61}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.61</td><td data-sheets-formula="=max(R[-38]C[5]:R[-33]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.81}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.81</td><td data-sheets-formula="=average(R[-38]C[4]:R[-33]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.7050000000000001}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.71</td><td data-sheets-formula="=median(R[-38]C[3]:R[-33]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.695}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.70</td><td data-sheets-formula="=stdev(R[-38]C[2]:R[-33]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.08983317872590285}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.09</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-1</td><td data-sheets-formula="=min(R[-32]C[6]:R[-25]C[6])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.61}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.61</td><td data-sheets-formula="=max(R[-32]C[5]:R[-25]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.71}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.71</td><td data-sheets-formula="=average(R[-32]C[4]:R[-25]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.6399999999999999}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.64</td><td data-sheets-formula="=median(R[-32]C[3]:R[-25]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.62}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.62</td><td data-sheets-formula="=stdev(R[-32]C[2]:R[-25]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.04070801956792857}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.04</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"range-2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">range-2</td><td data-sheets-formula="=min(R[-24]C[6]:R[-18]C[6])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.55}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.55</td><td data-sheets-formula="=max(R[-24]C[5]:R[-18]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.97}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.97</td><td data-sheets-formula="=average(R[-24]C[4]:R[-18]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.7557142857142857}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.76</td><td data-sheets-formula="=median(R[-24]C[3]:R[-18]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.74}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.74</td><td data-sheets-formula="=stdev(R[-24]C[2]:R[-18]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.16060896019599308}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.16</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"writes"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">writes</td><td data-sheets-formula="=min(R[-17]C[6]:R[-8]C[6])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.44}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.44</td><td data-sheets-formula="=max(R[-17]C[5]:R[-8]C[5])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":1.08}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1.08</td><td data-sheets-formula="=average(R[-17]C[4]:R[-8]C[4])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.625}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.63</td><td data-sheets-formula="=median(R[-17]C[3]:R[-8]C[3])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.5549999999999999}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.56</td><td data-sheets-formula="=stdev(R[-17]C[2]:R[-8]C[2])" data-sheets-numberformat="{"1":2,"2":"0.00","3":1}" data-sheets-value="{"1":3,"3":0.18951692976266438}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">0.19</td></tr></tbody></table></google-sheets-html-origin></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTm_9W-Hd4rcZktlcIK6CvRS1b7t7f-TzlLpBrO4nnyKDFdGmWdUBaQSq0qi2vQLJ6EhTnXcmAC4bhh4T5U8Rfdv3qejMzk_zsNlEMz8ySOd6EOuaV6eM6PtWZdkOh6ZujP13WHGYovvQHv7k_RJORpSb-4oDTKks23aw65mznWjew1s0pfis-H1OX8csN/s600/Point%20query,%20part%201_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTm_9W-Hd4rcZktlcIK6CvRS1b7t7f-TzlLpBrO4nnyKDFdGmWdUBaQSq0qi2vQLJ6EhTnXcmAC4bhh4T5U8Rfdv3qejMzk_zsNlEMz8ySOd6EOuaV6eM6PtWZdkOh6ZujP13WHGYovvQHv7k_RJORpSb-4oDTKks23aw65mznWjew1s0pfis-H1OX8csN/w640-h396/Point%20query,%20part%201_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwwUEP1ZvDAGd-xFrSksG0wZX1iIXMYntga7vgVEmVRSU8OFVwiur0hSBA1DyWEYIm5MncEstvnV1Izk38gl3XybDz_rL7616O2O7bRCKifD0zcutoqtpikY7ocft3h9rkTq32T4PHmjdnQF-Sr7fEXofUjPku3IHuVqYy-MvfAeSyy3lDo_aBQzJOG10F/s600/Point%20query,%20part%202_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwwUEP1ZvDAGd-xFrSksG0wZX1iIXMYntga7vgVEmVRSU8OFVwiur0hSBA1DyWEYIm5MncEstvnV1Izk38gl3XybDz_rL7616O2O7bRCKifD0zcutoqtpikY7ocft3h9rkTq32T4PHmjdnQF-Sr7fEXofUjPku3IHuVqYy-MvfAeSyy3lDo_aBQzJOG10F/w640-h396/Point%20query,%20part%202_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO03iU7uMESMF4V1Wx6eRUFw1DeSvSYOuc4Ga6AcOkh94_eP92wGxYsj17hHkMQurfKYn-JNspMMowzqoY_iKCBirBDwLIOd5yJ-gKhr8HRmhde1MCQnso9XlEGjzkfvK7H71M-lMnGD4t3zNlpJ6ZoD7Jt8_6w2VyjntSV5afNIr7s4Bc2B7uvl7LrVCc/s600/Range%20query,%20part%201_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO03iU7uMESMF4V1Wx6eRUFw1DeSvSYOuc4Ga6AcOkh94_eP92wGxYsj17hHkMQurfKYn-JNspMMowzqoY_iKCBirBDwLIOd5yJ-gKhr8HRmhde1MCQnso9XlEGjzkfvK7H71M-lMnGD4t3zNlpJ6ZoD7Jt8_6w2VyjntSV5afNIr7s4Bc2B7uvl7LrVCc/w640-h396/Range%20query,%20part%201_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdw33LOtjV1c2nr10Yy3mNuPYwMYsBW5koBU8i9HNrR84udY3piPIPMv_jmqPjCAo9CFvVIKprmeduFIvq6z-zY_G8AmpqQPIU-ikQNVZ8lNC_VbZnYT169OK_qZfUHwsM-bYrQVSlVoK1MxEJhMsw7iXZQtmGVdgYenuaoJOgaH_AUcUoZmI6d6GBleKh/s600/Range%20query,%20part%202_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjdw33LOtjV1c2nr10Yy3mNuPYwMYsBW5koBU8i9HNrR84udY3piPIPMv_jmqPjCAo9CFvVIKprmeduFIvq6z-zY_G8AmpqQPIU-ikQNVZ8lNC_VbZnYT169OK_qZfUHwsM-bYrQVSlVoK1MxEJhMsw7iXZQtmGVdgYenuaoJOgaH_AUcUoZmI6d6GBleKh/w640-h396/Range%20query,%20part%202_%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjL6LuoBvinKJ8QMD4uEIkAXrmYXSC13fbA1oh6aTu71YywmZjfGBJdgHYFbDq2YjPClPfYdLbzamgBGa0OR9r-51ZZOZSzc0wdxASnDwztZp82YqRDJU4qKcXtw0izqy2lWppWiTumhdO4jBiDty4WMgk5H19idbuXZW1LGYWNwJ4brkP37VzQ-KesyBkz/s600/Writes_%20%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjL6LuoBvinKJ8QMD4uEIkAXrmYXSC13fbA1oh6aTu71YywmZjfGBJdgHYFbDq2YjPClPfYdLbzamgBGa0OR9r-51ZZOZSzc0wdxASnDwztZp82YqRDJU4qKcXtw0izqy2lWppWiTumhdO4jBiDty4WMgk5H19idbuXZW1LGYWNwJ4brkP37VzQ-KesyBkz/w640-h396/Writes_%20%20MySQL%205.6,%205.7,%208.0%20relative%20to%205.6.21.png" width="640" /></a></div><div><b>MySQL 8.0: some point releases</b></div><div><br /></div><div>This section uses 8.0.13 as the base version and then compares that with 8.0.14, 8.0.20, 8.0.28, 8.0.35 and 8.0.36 to show how performance has changed from 8.0.13 to 8.0.36.</div><div><br /></div><div>There was a perf bug in 8.0.28 (<a href="https://bugs.mysql.com/bug.php?id=102037">bug 102037</a>) from the optimizer for queries with large in-lists that explains the two results below in <i>Point query, part 2</i> that are close to 0.40.</div><div><br /></div><div>From MySQL 8.0.13 to 8.0.36</div><div><ul style="text-align: left;"><li>Point queries are ~5% slower in 8.0.36</li><li>Range queries without aggregation are between 6% and 15% slower in 8.0.36 and for a few microbenchmarks there is a big regression after 8.0.28 (possibly <a href="https://bugs.mysql.com/bug.php?id=111538">bug 111538</a>)</li><li>Range queries with aggregation are mostly ~15% slower in 8.0.36</li><li>Full scan is ~32% slower in 8.0.36 with a big regression after 8.0.28 (possibly <a href="https://bugs.mysql.com/bug.php?id=111538">bug 111538</a>)</li><li>Writes are ~20% slower in 8.0.36 with a big regression after 8.0.20</li></ul></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCNX9H9V6sdFnjmUbSY_9nwnTKWrr-nd0_06A0uQ-Y9w2yRhZw1F5Cw7aXX_uxYfiqX_9YNn1kNKsZb6ngdA5lNo7HNsU5x7IzaClQmiTpQcUlxLafH03FeslcKK6qMnTQeQy4XtcThBaF4QfrehKDzkFZJhhx5Z2xOUFNIFjp9_tJhS3-xcqYFBdhWWJl/s600/Point%20query,%20part%201_%20MySQL%208.0,%20relative%20to%208.0.13.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhCNX9H9V6sdFnjmUbSY_9nwnTKWrr-nd0_06A0uQ-Y9w2yRhZw1F5Cw7aXX_uxYfiqX_9YNn1kNKsZb6ngdA5lNo7HNsU5x7IzaClQmiTpQcUlxLafH03FeslcKK6qMnTQeQy4XtcThBaF4QfrehKDzkFZJhhx5Z2xOUFNIFjp9_tJhS3-xcqYFBdhWWJl/w640-h396/Point%20query,%20part%201_%20MySQL%208.0,%20relative%20to%208.0.13.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhANYV_sqK4AvcBFzNfTMWdlUVRllJe2ADPlAvndZ9sXT4LQaFAVxh4IsxDGPbuaeF4B3PlqaRhcjUkDHCJm6Ui740XzcBQFj4VIxTXJ0IdT25bw3bu6eP0kXL83dm_HA_B4vpvTUqNmOPzEwXLyQibotI4NuSrrTIeHF1xhMWb7Wyd2Fppb2RF1Xgrd3AY/s600/Point%20query,%20part%202_%20MySQL%208.0,%20relative%20to%208.0.13.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhANYV_sqK4AvcBFzNfTMWdlUVRllJe2ADPlAvndZ9sXT4LQaFAVxh4IsxDGPbuaeF4B3PlqaRhcjUkDHCJm6Ui740XzcBQFj4VIxTXJ0IdT25bw3bu6eP0kXL83dm_HA_B4vpvTUqNmOPzEwXLyQibotI4NuSrrTIeHF1xhMWb7Wyd2Fppb2RF1Xgrd3AY/w640-h396/Point%20query,%20part%202_%20MySQL%208.0,%20relative%20to%208.0.13.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigBXzm_-TTJpcBxjg5Dd5W7komXlyfBDYBknWHaLHXbnX7MXQ3dnFPVXjtDqCbwwfv9vLw5StnkAeQQUlymev9s39D_JneKs6eY0WEn0U8thpiuvSsDQVnyxGu4ehDZRlKhgSblxWzrlAGrdvrn9iaktikg1Sun58EnNT9cmwI9goJA4TDJpmRT5yqI3I4/s600/Range%20query,%20part%201_%20MySQL%208.0,%20relative%20to%208.0.13.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigBXzm_-TTJpcBxjg5Dd5W7komXlyfBDYBknWHaLHXbnX7MXQ3dnFPVXjtDqCbwwfv9vLw5StnkAeQQUlymev9s39D_JneKs6eY0WEn0U8thpiuvSsDQVnyxGu4ehDZRlKhgSblxWzrlAGrdvrn9iaktikg1Sun58EnNT9cmwI9goJA4TDJpmRT5yqI3I4/w640-h396/Range%20query,%20part%201_%20MySQL%208.0,%20relative%20to%208.0.13.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7T-BAzrXuSAYg_wJbxIk46pofTqhOYXwrAwEwof8MtyW95i5E03qLrwB5bcpZ76b5fE8WAIgKN0xpNwgl5zOnEjdr8CNcBQsR4SfBME4aDtTZY9N7HmqbuW18aJQbYj9ES0kIEll-7pKnLkfFomp3eYhwDd_PTucGIab-ZNFii5GdJOn_NiM1uPaAmn00/s600/Range%20query,%20part%202_%20MySQL%208.0,%20relative%20to%208.0.13.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7T-BAzrXuSAYg_wJbxIk46pofTqhOYXwrAwEwof8MtyW95i5E03qLrwB5bcpZ76b5fE8WAIgKN0xpNwgl5zOnEjdr8CNcBQsR4SfBME4aDtTZY9N7HmqbuW18aJQbYj9ES0kIEll-7pKnLkfFomp3eYhwDd_PTucGIab-ZNFii5GdJOn_NiM1uPaAmn00/w640-h396/Range%20query,%20part%202_%20MySQL%208.0,%20relative%20to%208.0.13.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhysqQKkv6cx0mF5TT0J8eOkm0DJLh4dGSq8nTNnZKzZ3Bkl3ARv-Y4Kcul88bz0ZtAXX4SdFWfGaJfYowciDmC5YRVbgz1cIpfqqVWP3tVWbtPNtxRheSbYUT_hz5CDQgy2vbd22URz2TvjUkr4D01OB7rdXsggHEZU9XTSXUX1RMy0bi5NHH_54ain3cv/s600/Writes_%20MySQL%208.0,%20relative%20to%208.0.13.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhysqQKkv6cx0mF5TT0J8eOkm0DJLh4dGSq8nTNnZKzZ3Bkl3ARv-Y4Kcul88bz0ZtAXX4SdFWfGaJfYowciDmC5YRVbgz1cIpfqqVWP3tVWbtPNtxRheSbYUT_hz5CDQgy2vbd22URz2TvjUkr4D01OB7rdXsggHEZU9XTSXUX1RMy0bi5NHH_54ain3cv/w640-h396/Writes_%20MySQL%208.0,%20relative%20to%208.0.13.png" width="640" /></a></div><div><b>MySQL 5.7: some point releases</b></div></div><div><br /></div><div>This section uses 5.7.10 as the base version and then compares that with 5.7.20, 5.7.30 and 5.7.44 to show how performance has changed from 5.7.10 to 5.7.44.</div><div><br /></div><div>For most microbenchmarks the throughput in 5.7.44 is no more than 5% less than in 5.7.10. For two microbenchmarks (<i>update-index</i> and <i>update-inlist</i>) the throughput in 5.7.44 is larger than in 5.7.10.</div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs0joYUb9ZIWzQSosWdp-raCXmcUPSSeYRI1AT35200_iI-_fl1QL_A99_bAfXHAzsPlegGkMeRkMwVH-7DCjxSJ7i_Jqq7YzwYCPRpbfA9Ytj_OkuKKq9XK7l8qAxZYOkUDGO-SEj-5ty21pmOxO4knQpukjPPNwbRkJ7tKfUV21x9_eNelYpvlkUUUfV/s600/Point%20query,%20part%201_%20MySQL%205.7,%20relative%20to%205.7.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs0joYUb9ZIWzQSosWdp-raCXmcUPSSeYRI1AT35200_iI-_fl1QL_A99_bAfXHAzsPlegGkMeRkMwVH-7DCjxSJ7i_Jqq7YzwYCPRpbfA9Ytj_OkuKKq9XK7l8qAxZYOkUDGO-SEj-5ty21pmOxO4knQpukjPPNwbRkJ7tKfUV21x9_eNelYpvlkUUUfV/w640-h396/Point%20query,%20part%201_%20MySQL%205.7,%20relative%20to%205.7.10.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVyCf73qdujfWISB6_H65d7Zzi2OTmUHiBH_iht21yvmjXih826_0ijupsXulD95VvU4pUqYihXzFL7o5a_pOMk4uh0bqn2mTb42voiDt8ou8Il2Gato6-AdoX27LtNek_xHZrMXimFUzW3GrUseavlVuzhTcpwG3fq9u2NxMGsD61GItLgr-dZ8aPwDDb/s600/Point%20query,%20part%202_%20MySQL%205.7,%20relative%20to%205.7.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVyCf73qdujfWISB6_H65d7Zzi2OTmUHiBH_iht21yvmjXih826_0ijupsXulD95VvU4pUqYihXzFL7o5a_pOMk4uh0bqn2mTb42voiDt8ou8Il2Gato6-AdoX27LtNek_xHZrMXimFUzW3GrUseavlVuzhTcpwG3fq9u2NxMGsD61GItLgr-dZ8aPwDDb/w640-h396/Point%20query,%20part%202_%20MySQL%205.7,%20relative%20to%205.7.10.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfjSP3zl_gjjvVwzc2SRcX4E-VPEFPJqB1ZRVA6EONwl7rq432TjUXDL6EiugJ8zAb3lN7YA6Rj_YEFRpHXoJmiFcxXPpZc0A8RP8ZdIIdq-6JEJOOBHS9Azdld0ODFCkzdOYAmuRa_96XWwBerWRdqoUL0FAP-cxKHiIaM11ygfZed_8NhzWipH2sqeQ3/s600/Range%20query,%20part%201_%20MySQL%205.7,%20relative%20to%205.7.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfjSP3zl_gjjvVwzc2SRcX4E-VPEFPJqB1ZRVA6EONwl7rq432TjUXDL6EiugJ8zAb3lN7YA6Rj_YEFRpHXoJmiFcxXPpZc0A8RP8ZdIIdq-6JEJOOBHS9Azdld0ODFCkzdOYAmuRa_96XWwBerWRdqoUL0FAP-cxKHiIaM11ygfZed_8NhzWipH2sqeQ3/w640-h396/Range%20query,%20part%201_%20MySQL%205.7,%20relative%20to%205.7.10.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-IXAZvWrN8EOFWQgSgIuUwHGNcxtTLTlFLXAUttiB9bm968AlDC0QKHTKwLjok1lyuWVMfTgH0fCmReINL5y2dz33EFQtk61l6GZ3YQoS-kANkid35qx8q-69Yml5WIzTYwRHJbyLj8QRE_gVW49igpsGUXCHDUX_v8LR2LDXAYGXXJEl62kKBXAkVRkP/s600/Range%20query,%20part%202_%20MySQL%205.7,%20relative%20to%205.7.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi-IXAZvWrN8EOFWQgSgIuUwHGNcxtTLTlFLXAUttiB9bm968AlDC0QKHTKwLjok1lyuWVMfTgH0fCmReINL5y2dz33EFQtk61l6GZ3YQoS-kANkid35qx8q-69Yml5WIzTYwRHJbyLj8QRE_gVW49igpsGUXCHDUX_v8LR2LDXAYGXXJEl62kKBXAkVRkP/w640-h396/Range%20query,%20part%202_%20MySQL%205.7,%20relative%20to%205.7.10.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhppNmh6-SZv4oai2z-TaLXtQM2qYSsP6QRAARzIc7MmOegVhpFbJm6KVUu5RyyhDT25uMOdE5Nj-n9xIdsYQsR00J2omE7QpgPKLPIvJXUq22bhuNCXqk67k-c4j25sk8z6zcN-r_cfcfJJZm3EwkBQYbEK2vnXhWwmj4on6u1EoVTPyy4sH3y9Oh9jDtA/s600/Writes_%20MySQL%205.7,%20relative%20to%205.7.10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhppNmh6-SZv4oai2z-TaLXtQM2qYSsP6QRAARzIc7MmOegVhpFbJm6KVUu5RyyhDT25uMOdE5Nj-n9xIdsYQsR00J2omE7QpgPKLPIvJXUq22bhuNCXqk67k-c4j25sk8z6zcN-r_cfcfJJZm3EwkBQYbEK2vnXhWwmj4on6u1EoVTPyy4sH3y9Oh9jDtA/w640-h396/Writes_%20MySQL%205.7,%20relative%20to%205.7.10.png" width="640" /></a></div><div><b>MySQL 5.6: some point releases</b></div></div><div><br /></div><div>This section uses 5.6.21 as the base version and then compares that with 5.6.31, 5.6.41 and 5.6.51 to show how performance has changed from 5.6.21 to 5.6.51.</div><div><br /></div><div>For most microbenchmarks the throughput in 5.6.51 is no more than 5% less than in 5.6.21. The largest regression is ~10% from full scan (<i>scan_range=100</i>) and 5.6.51 is faster than 5.6.21 for the <i>update-inlist</i> microbenchmark.</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNLT9GGh1hKPXMS_9Mf6akWV15ZF5h__RCLVmkPYRNPRXBrXOrZnjCtKVG_Z9bw6AQNfm9ommM7g_khLIkag86ExzhvD3484RMbiX8mjqip3bdeYfVo4vv8WXLzT5uehCoeYNe1sYnTfS7XtkW2CBrTh1KRIXm_5H6Z5fbnDfZn04EpIet-X6AKC3eTQwB/s600/Point%20query,%20part%201_%20MySQL%205.6,%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNLT9GGh1hKPXMS_9Mf6akWV15ZF5h__RCLVmkPYRNPRXBrXOrZnjCtKVG_Z9bw6AQNfm9ommM7g_khLIkag86ExzhvD3484RMbiX8mjqip3bdeYfVo4vv8WXLzT5uehCoeYNe1sYnTfS7XtkW2CBrTh1KRIXm_5H6Z5fbnDfZn04EpIet-X6AKC3eTQwB/w640-h396/Point%20query,%20part%201_%20MySQL%205.6,%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2btSHTPCzpovs7IdtWSgdDG4w_UDkhBQasBNM_z0Aj3_hIokUot8EoWUWRdvHtnbT7vUrSz8ovBwpx3KLQ6t7DSyst0dVSvUyz3l5Xo7aydvSuqoHyOEpPMhH7XbC7KTsl6ohNW3wwnxgkr9evfSP8zhyphenhyphenlhI46sMujAS3vq8a-e9J8CZxMulezj0kr_4C/s600/Point%20query,%20part%202_%20MySQL%205.6,%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh2btSHTPCzpovs7IdtWSgdDG4w_UDkhBQasBNM_z0Aj3_hIokUot8EoWUWRdvHtnbT7vUrSz8ovBwpx3KLQ6t7DSyst0dVSvUyz3l5Xo7aydvSuqoHyOEpPMhH7XbC7KTsl6ohNW3wwnxgkr9evfSP8zhyphenhyphenlhI46sMujAS3vq8a-e9J8CZxMulezj0kr_4C/w640-h396/Point%20query,%20part%202_%20MySQL%205.6,%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWEQATZU7l6pVlKkLEfwaWO4wm1kHXmduUfrB4MNhDqQ-GcOQtNk4jMQlXDoH4jgyLb2m1E0DGV7BchvyLrmv7XOg3EdYJlxIAOyvDtWVZZ96snYMXaoHCJf4NUinFXxQXAjGR4YyQml33XJ1Kdb48JNhLMieDRz4JLEKTN-Dj12QoqgGQ8Zd4egpRlArM/s600/Range%20query,%20part%201_%20MySQL%205.6,%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWEQATZU7l6pVlKkLEfwaWO4wm1kHXmduUfrB4MNhDqQ-GcOQtNk4jMQlXDoH4jgyLb2m1E0DGV7BchvyLrmv7XOg3EdYJlxIAOyvDtWVZZ96snYMXaoHCJf4NUinFXxQXAjGR4YyQml33XJ1Kdb48JNhLMieDRz4JLEKTN-Dj12QoqgGQ8Zd4egpRlArM/w640-h396/Range%20query,%20part%201_%20MySQL%205.6,%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitiTx0gyOQ9mfuSjV7Dq-9kBU1WAE_Llyh7vUd4rhg_ehz_kby6SkmkwA4gX5c6uz8B3XxwFoxlbKtPziwGL1Ye2JgG_MRY6Fe5t3qMv_oVLmhDvJy08CDsJVHyaRghKUox6RP4b2ojDbuMSDzFd6i185L5y41kGIGvaMimaA5UDmrAUzjFilCys52E-LC/s600/Range%20query,%20part%202_%20MySQL%205.6,%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEitiTx0gyOQ9mfuSjV7Dq-9kBU1WAE_Llyh7vUd4rhg_ehz_kby6SkmkwA4gX5c6uz8B3XxwFoxlbKtPziwGL1Ye2JgG_MRY6Fe5t3qMv_oVLmhDvJy08CDsJVHyaRghKUox6RP4b2ojDbuMSDzFd6i185L5y41kGIGvaMimaA5UDmrAUzjFilCys52E-LC/w640-h396/Range%20query,%20part%202_%20MySQL%205.6,%20relative%20to%205.6.21.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHdkINgpUWghiH2tWcv9UvvLfkDtnF5lKh8-HL71-QCmZGvY3CX0vgzQ0Eleh1S7XC6671JN3seRH_ybNIsnkEovo4IQuoTYL5j7yrS5tBEBjXd3eNaiOVvg1SmoO53PJgmLjnq5zJ4hi5IOvR_DBKJ_fsiw5x4-vwMbKUdyLExWkGRP9t4oe1xtN1kdCo/s600/Writes_%20MySQL%205.6,%20relative%20to%205.6.21.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjHdkINgpUWghiH2tWcv9UvvLfkDtnF5lKh8-HL71-QCmZGvY3CX0vgzQ0Eleh1S7XC6671JN3seRH_ybNIsnkEovo4IQuoTYL5j7yrS5tBEBjXd3eNaiOVvg1SmoO53PJgmLjnq5zJ4hi5IOvR_DBKJ_fsiw5x4-vwMbKUdyLExWkGRP9t4oe1xtN1kdCo/w640-h396/Writes_%20MySQL%205.6,%20relative%20to%205.6.21.png" width="640" /></a></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-1259289143519057952024-02-12T14:16:00.000-08:002024-02-17T09:36:13.301-08:00It wasn't a performance regression in Postgres 14<p>With help from a Postgres expert (Peter Geoghegan) I was able to confirm there wasn't a performance regression for Postgres 14 in a few of the benchmark steps with the Insert Benchmark as I started to report on in a <a href="https://smalldatum.blogspot.com/2024/01/explaining-performance-regression-in.html">previous blog post</a>. The results here are from a small server for both cached and IO-bound workloads and replace my previous blog posts (<a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_24.html">cached</a>, <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_27.html">IO-bound</a>).</p><p>The reason for the false alarm is that index cleanup was skipped during vacuum starting with Postgres 14 and the impact is that the optimizer had more work to do (more not-visible index entries to skip) in the get_actual_variable_range function. Output like this from the vacuum command makes that obvious:</p><p><i>table "pi1": index scan bypassed: 48976 pages from table (0.62% of total) have 5000000 dead item identifiers</i></p><p>The problem is solved by adding <i>INDEX_CLEANUP ON</i> to the <a href="https://www.postgresql.org/docs/current/sql-vacuum.html">vacuum</a> command.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>All results here are from a low-concurrency workload on a small server (1 client, <= 3 connections, 8 cores). Results from a bigger server are pending. </li><li>For cached workloads throughput improves a lot on all benchmark steps except point query (qp*) where from Postgres 9.0 to 16 it is slightly better on the SER4 servers and then stable to slightly slower on the SER7 server.</li><li>For IO-bound workloads on the SER4 server from Postgres 9.0 through 16 the results are similar to the cache workload -- things improve a lot for all benchmark steps except point query (qp*).</li><li>For IO-bound workloads on the SER7 server from Postgres 12 through 16 the throughput for range and point queries (qr*, qp*) are stable while for write-heavy there are some regressions. Also there is a ~5% increase in CPU/operation from Postgres 12 through 16 on the random-write benchmark steps (l.i1, l.i2).</li><li>For IO-bound workloads with the SER4 server the benchmark steps were unable to sustain the target write rates during the qr1000 and qp1000 benchmark steps (1000 inserts/s + 1000 deletes/s) for many of the Postgres versions. This was an issue on the SER7 server that has a faster CPU and better RAM / data ratio.</li><li>The delete/s rate is between 5X and 20X larger for the l.i1 benchmark step vs the l.i2 step. The issue is that l.i1 deletes 10X more rows/statement and the impact of the optimizer CPU overhead is much worse during l.i2 -- see my comments below about the weird workload.</li></ul><div><b>Update</b> - I have claimed that InnoDB and MyRocks don't have this problem and that is more truthy than true because they had a problem from MVCC GC getting behind, but the problem shows up on a SELECT statement while Postgres has the problem with a DELETE statement. <a href="https://smalldatum.blogspot.com/2023/07/myrocks-innodb-and-postgres-as-queue.html">See here</a> for details.</div><p></p><p><b>Editorial</b></p><p>At a high-level there are several issues:</p><p></p><ol style="text-align: left;"><li>The workload that triggers the issue is weird (uncommon)</li><li>With MVCC GC in Postgres, garbage can remain in indexes for a while</li><li>The Postgres optimizer can use too much CPU time in get_actual_variable_range to help figure out index selectivity even when there is only one good index for a SQL statement</li></ol><div>First, the workload that triggers the issue is weird. I hope to rewrite the Insert Benchmark later this year to be less weird. The weirdness is that last year I enhanced the Insert Benchmark to optionally delete from tables at the same rate as inserts to keep the tables from growing too big. A too big table might make my tests fail when a disk is full. And a too big table means I can't run the cached (in-memory) variant of the test for longer periods of time. But the problem is the enhancements meant I added statements like <i>DELETE FROM foo WHERE pk_column > $a and < $b</i>. The constants $a and $b are usually in or even less than the histogram bucket with the smallest value for the column which means that get_actual_variable_range then tries to read from the index to determine the current minimum value for that index.</div><div><br /></div><div>Second, with MVCC GC in Postgres, garbage can remain in indexes for longer than I want it to.</div><div><ul style="text-align: left;"><li>MVCC GC in InnoDB is called purge and is running all of the time -- although it can be slowed by read IO latency and temporarily halted when there is a long-open snapshot. But most of the time InnoDB will cleanup (remove non-visible index entries) soon after transaction commit.</li><li>Cleanup with Postgres has more lag. It can be done by vacuum with a lag of many hours. It can also be done by simple index deletion but only on page splits and my workload doesn't trigger page splits on DELETE. Perhaps the existing code can be updated to also trigger simple index deletion when an index leaf page is mostly or all non-visible entries. I risk writing nonsense about Postgres in this regard, and for better information see <a href="https://www.youtube.com/watch?v=JDG4bMHxCH8&t=1823s">this video</a> from Peter and this <a href="https://www.postgresql.org/docs/current/btree-implementation.html#BTREE-DELETION">document page</a>.</li></ul></div><p></p><p>Third, the Postgres optimizer can use too much CPU time in get_actual_variable_range. I don't mind that get_actual_variable_range exists because it is useful for cases where index statistics are not current. But the problem is that for the problematic SQL statement (see the DELETE above and <a href="https://smalldatum.blogspot.com/2024/01/explaining-performance-regression-in.html">this blog post</a>) there is only one good index for the statement. So I prefer the optimizer not do too much work in that case. I have experienced this problem a few times with MySQL. One of the fixes from upstream MySQL was to change the optimizer to do less work when there was a FORCE INDEX hint. And with some OLTP workloads where the same statements are so frequent I really don't want the optimizer to use extra CPU time. For the same reason, I get much better throughput from Postgres when prepared statements are enabled and now I always enable them for the range and point queries with Postgres during the insert benchmark, but not for MySQL (because they don't help much with MySQL).</p><p><b>Build + Configuration</b></p><div><div>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">previous report</a> for more details. In all but one case (IO-bound on SER7) I tested these versions: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 15.5, 16.1. For IO-bound on SER7 tests are from Postgres 12 through 16.</div><div><br />The configuration files are in subdirectories <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804">from here</a>. Search for files named <i>conf.diff.cx9a2_bee</i> and <i>conf.diff.cx9a2_ser7</i> which exist for each major version of Postgres<i>.</i></div><div><br /></div><div><b>The Benchmark</b></div><div><br /></div><div>The benchmark is run with one client.</div><div><br /></div><div>There are two test servers and the SER7 has a faster CPU. More info on the servers <a href="https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">is here</a>:</div><div><ul style="text-align: left;"><li>Beelink SER4 with 8 AMD cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device</li><li>Beelink SER7 with 8 AMD cores, 32G RAM, Ubuntu 22.04 and XFS using 1 m.2 device</li></ul></div><div>The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. With a cached workload the value of X is 30M for SER4 and 60M for SER7. With an IO-bound workload X is 800M for both because I forgot to make it larger for SER7.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts XM rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate. With a cached workload the value of X is 40M. With an IO-bound workload X is 4M.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and inserts XM rows total. With a cached workload the value of X is 10M. With an IO-bound X is 1M.</li><li>Vacuum the test table, do a checkpoint and wait ~Y seconds to reduce variance during the read-write benchmark steps that follow. The value if Y is based on the size of the table.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for Z seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. For cached workloads the value of Z was 1800. For IO-bound on SER4 it was 1800 and on SER7 it was 7200.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul></div></div></div></div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for:</div><div><ul style="text-align: left;"><li>Cached: <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.mem.bee.pg/all.html">SER4</a> and <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.mem.ser7.pg/all.html">SER7</a></li><li>IO-bound: <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.io.bee.pg/all.html">SER4</a> and <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.io.ser7.pg/all.html">SER7</a></li></ul></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.mem.bee.pg/all.html#summary">the summary</a> for SER4 with a cached workload</div></div><div><ul><li>The base case is <span style="text-align: right;">pg9023_def which means Postgres 9.0.23</span></li><li><span style="text-align: right;">For the benchmark steps</span></li><ul><li><span style="text-align: right;">l.i0 - improves in Postgres 9.4 and 11.22 and then is stable</span></li><li><span style="text-align: right;">l.x - improves in Postgres 9.6 and then is stable</span></li><li><span style="text-align: right;">l.i1, l.i2 - </span>improves in Postgres 12 through 14</li><li>qr100, qr500, qr1000 - slow but steady improvements from Postgres 9.2 through 16</li><li>qp100, qp500 - slow but steady improvements from Postgres 9.2 through 16</li><li>qp1000 - stable from Postgres 9 through 16. Perhaps this is most affected by the CPU overhead from get_actual_variable_range.</li></ul><li>Comparing throughput in Postgres 16.2 to 9.0.23</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.22</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.79</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">3.38</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.36</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is<span style="background-color: white;"> </span><span style="background-color: #d9ead3;">1.20</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.19</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.23</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.08</span>, <span style="background-color: #d9ead3;">1.10</span>, <span style="background-color: #eeeeee;">1.01</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.mem.ser7.pg/all.html#summary">the summary</a> for SER7 with a cached workload</div><div><ul style="text-align: left;"><li>The base case is <span style="text-align: right;">pg9023_def which means Postgres 9.0.23</span></li><li><span style="text-align: right;">For the benchmark steps</span></li><ul><li><span style="text-align: right;">l.i0 - improves in Postgres 9.4 and 11.22 and then is stable</span></li><li><span style="text-align: right;">l.x - improves in Postgres 9.6 and 10 and then is stable</span></li><li><span style="text-align: right;">l.i1, l.i2 - </span>improves in Postgres 12 through 14</li><li>qr100, qr500, qr1000 - improves in Postgres 9.2 and then is stable or slowly improving</li><li>qp100, qp500, qp1000 - improves in Postgres 9.2 through 9.5 and then slowly gets worse</li></ul><li>Comparing throughput in Postgres 16.2 to 9.0.23</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.34</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.65</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">3.12</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.42</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is<span style="background-color: white;"> </span><span style="background-color: #d9ead3;">1.44</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.62</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.62</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">1.01</span>, <span style="background-color: #eeeeee;">0.92</span>, <span style="background-color: #eeeeee;">0.99</span></li></ul></ul></ul></div><div>From <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.io.bee.pg/all.html#summary">the summary</a> for SER4 with an IO-bound workload</div><div><ul style="text-align: left;"><li>The base case is <span style="text-align: right;">pg9023_def which means Postgres 9.0.23</span></li><li><span style="text-align: right;">For the benchmark steps</span></li><ul><li><span style="text-align: right;">l.i0 - improves in Postgres 11.22 and then is stable</span></li><li><span style="text-align: right;">l.x - improves in Postgres 9.4 through 10 and then is stable</span></li><li><span style="text-align: right;">l.i1, l.i2 - </span>improves in Postgres 12 and then is stable</li><li>qr100, qr500, qr1000 - slowly improves from Postgres 9.2 though 11 and then is stable</li><li>qp100, qp500, qp1000 - same as qr100, qr500, qr1000</li></ul><li>Comparing throughput in Postgres 16.2 to 9.0.23</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.21</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.29</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.85</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.85</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is<span style="background-color: white;"> </span><span style="background-color: #d9ead3;">1.15</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.23</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.36</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.99</span>, <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">0.98</span></li></ul></ul></ul></div><div>From <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.io.ser7.pg/all.html#summary">the summary</a> for SER7 with an IO-bound workload</div><div><ul style="text-align: left;"><li>I only have results from Postgres 12 though 16</li><li>The read-write benchmark steps were run for 7200s vs 1800s above</li><li>Looking at write rates over time for the l.i2 benchmark step where write is insert/s and delete/s the rates are ~195/s at the start of the benchmark step and ~155/s at the end. I assume the issue is there is more garbage (non-visible index entries) in the PK index over time so there is more CPU overhead from get_actual_variable_range having to read and skip them while figuring out the minimum visible value in the index during DELETE statements.</li><li>The base case is <span style="text-align: right;">pg1217_def which means Postgres 12.17. The improvements show here don't match the results above for IO-bound on the SER4 server because it uses an older (9.0) base case</span></li><li><span style="text-align: right;">For the benchmark steps</span></li><ul><li><span style="text-align: right;">l.i0 - throughput is stable</span></li><li><span style="text-align: right;">l.x - throughput slowly improves from Postgres 13 through 16</span></li><li><span style="text-align: right;">l.i1, l.i2 - with some variance, throughput gets worse from Postgres 12 through 16. From vmstat results normalized by write rates I see a 4% to 7% increase in CPU/operation <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.io.ser7.pg/all.html#l.i1.metrics">on SER7</a>. If I limit myself to Postgres 12.17 through 16 then I also see a 5% to 8% increase <a href="https://mdcallag.github.io/reports/24_02_12.1u.1tno.io.bee.pg/all.html#l.i1.metrics">on SER4</a>.</span></li><li>qr100, qr500, qr1000 - throughput is stable</li><li>qp100, qp500, qp1000 - throughput is stable</li></ul><li>Comparing throughput in Postgres 16.2 to 9.0.23</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.00</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.14</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.93</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.88</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is<span style="background-color: white;"> </span><span style="background-color: #eeeeee;">1.02</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.00</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.00</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">0.98</span></li></ul></ul></ul></div><div><b>Target write rates</b></div><div><b><br /></b></div><div>The third table in the summaries linked above show the write rates sustained during the read-write benchmark steps. The target write rates are 100/s for qr100 and qp100, 500/s for qr500 and qp500 and then 1000/s for qr1000 and qp1000. Note that X/s means X inserts/s and X delete/s. When the value is close enough to the target then I assume the target has been sustained. The table cells in red indicate the cases where the target has not been sustained.</div><div><br /></div><div>For cached workloads all versions sustained the target write rates.</div><div><br /></div><div>For IO-bound workloads</div><div><ul style="text-align: left;"><li> Note that SER4 and SER7 had the same amount of data, but SER7 has twice as much RAM so it was less IO-bound. And SER7 has a faster CPU.</li><li>With SER4</li><ul><li>Postgres 9.x, 10 and 11 did not sustain the target write rates during qr1000 and qp1000</li><li>Postgres 13, 14, 15.5 and 16 did not sustain the target write rates during qp1000 but they were close to the target</li></ul><li>With SER7</li><ul><li>All versions sustained the target write rates with SER7.</li></ul></ul></div><div><div><br /></div></div><div><br /></div><div><br /></div></div><p><br /><br /></p>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-17478525578674828632024-02-01T11:56:00.000-08:002024-02-01T11:56:54.362-08:00Updated Insert benchmark: InnoDB/MySQL 5.6, 5.7 and 8.0, small server, IO-bound database<p>This has results for MySQL with InnoDB vs the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">updated Insert Benchmark</a> with an IO-bound workload and 8-core server with results from MySQL versions 5.6 through 8.0. Recent results from a cached workload and the same server <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-innodbmysql-56.html">are here</a>.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Regressions here with the IO-bound workload are smaller than with a <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-innodbmysql-56.html">cached workload</a> because the extra IO latency often dominates the extra CPU overhead that arrives in modern MySQL.</li><li>Regressions tend to be large between major versions (5.6 -> 5.7, 5.7 -> 8.0). While they were small within 5.6 and 5.7 (5.6.21 -> 5.6.51, 5.7.10 -> 5.7.44) there were also large within 8.0. </li><li>The perf schema continues to have performance problems. The biggest problems are a soon to be fixed bug for parallel create index (<a href="https://smalldatum.blogspot.com/2023/12/create-innodb-indexes-2x-faster-with.html">see here</a>) and a ~15% drop in range query throughput. The drop is larger for range than point queries in this workload because the point queries are much more IO-bound so the IO latency hides the cost of the perf schema.</li></ul><div>Comparing MySQL 8.0.36 with 5.6.21</div><div><ul style="text-align: left;"><li>Initial load (l.i0) throughput is <span style="background-color: #f4cccc;">~2X larger</span> in 5.6</li><li>Write only (l.i1, l.i2) throughput is <span style="background-color: #d9ead3;">~1.2X larger</span> in 8.0</li><li>Range queries (qr*) throughput is <span style="background-color: #f4cccc;">much smaller</span> in 8.0</li><li>Point queries (qp*) throughput is between <span style="background-color: #f4cccc;">~9% smaller</span> <span style="background-color: #eeeeee;">to similar</span><span style="background-color: white;"> in 8.0</span></li></ul></div><p></p><div><b>Build + Configuration</b></div><div><div><div><br /></div><div>I tested many versions of MySQL 5.6, 5.7 and 8.0 These were compiled from source. I used the CMake files <a href="https://github.com/mdcallag/mytools/tree/master/bench/build/dec23.cmk.patch.mysql">from here</a> with the <a href="https://github.com/mdcallag/mytools/tree/master/bench/build/dec23.cmk.patch.mysql">patches here</a> to fix problems that otherwise prevent compiling older MySQL releases on modern Ubuntu. In all cases I use the <i><b>rel</b></i> build that uses CMAKE_BUILD_TYPE =Release.<br /><br />I used the cz10a_bee my.cnf files that are here <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/my56/my.cnf.cz10a_bee">for 5.6</a>, <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804/my57">for 5.7</a> and <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804/my80/etc">for 8.0</a>. For 5.7 and 8.0 there are many variants of that file to make them work on a range of the point releases.</div></div><div><br /></div><div>The versions I tested are:</div><div><ul><li>5.6</li><ul><li>5.6.21, 5.6.31, 5.6.41, 5.6.51</li></ul><li>5.7</li><ul><li>5.7.10, 5.7.20, 5.7.30, 5.7.44</li></ul><li>8.0</li><ul><li>8.0.13, 8.0.14, 8.0.20, 8.0.28, 8.0.35, 8.0.36</li></ul></ul><div>For 8.0.35 I tested a few variations from what is described above to understand the cost of the performance schema:</div></div><div><ul><li><span style="text-align: right;">my8035_rel.cz10aps0_bee</span></li><ul><li><span style="text-align: right;">this uses my.cnf.cz10aps0_bee which is the same as my.cnf.cz10a_bee except it adds performance_schema =0</span></li></ul><li><span style="text-align: right;">my8035_rel_lessps.cz10a_bee</span></li><ul><li><span style="text-align: right;">the build disables as much as possible of the performance schema. The CMake file <a href="https://github.com/mdcallag/mytools/blob/master/bench/build/dec23.cmk.patch.mysql/mysql-8.0.35/cmk.80.rel_lessps">is here</a>.</span></li></ul></ul><div style="text-align: right;"><div style="text-align: left;"><b>The Benchmark</b></div><div style="text-align: left;"><br /></div><div style="text-align: left;">The test server is <a href="https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">described here</a>. It is a Beelink SER4 with 8 cores, 16G RAM and Ubuntu 22.04. Storage is an m.2 device with XFS and discard enabled. </div><div style="text-align: left;"><br />The benchmark is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> and is run with 1 client. The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert 800 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 4M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and 1M rows total</li><li>Work and waiting is done at the end of this step to reduce write back debt</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.56/all.html">MySQL 5.6</a>, <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.57/all.html">MySQL 5.7</a>, <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.80/all.html">MySQL 8.0</a> and <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.all/all.html">MySQL 5.6 to 8.0</a>.<br /><br /></div><div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested per benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div></div></div><div><br /></div><div>From the summary <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.56/all.html#summary">for 5.6</a></div><div><ul style="text-align: left;"><li>The base case is 5.6.21</li><li>Comparing 5.6.51 with 5.6.21</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.92</span> in 5.6.51</li><li>l.x - relative QPS is <span style="background-color: #eeeeee;">1.01</span> in 5.6.51</li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">0.99</span> in 5.6.51</li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">0.97</span>, <span style="background-color: #eeeeee;">0.98</span> in 5.6.51</li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">0.98</span> in 5.6.51</li></ul></ul></div><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.57/all.html#summary">for 5.7</a></div><div><ul style="text-align: left;"><li>The base case is 5.7.10</li><li>Comparing 5.7.44 with 5.7.10</li><ul><li>l.i0 - relative QPS is <span style="background-color: #eeeeee;">0.96</span> in 5.7.44</li><li>l.x - relative QPS is <span style="background-color: #eeeeee;">0.96</span> in 5.7.44</li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">0.98</span> in 5.7.44</li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #eeeeee;">0.95</span>, <span style="background-color: #eeeeee;">0.97</span>, <span style="background-color: #eeeeee;">0.97</span> in 5.7.44</li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">0.99</span>, <span style="background-color: #eeeeee;">0.99</span> in 5.7.44</li></ul></ul></div></div><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.80/all.html#summary">for 8.0</a></div><div><ul><li>The base case is 8.0.13</li><li>Comparing 8.0.36 with 8.0.13</li><ul><li>l.i0 - <span style="background-color: white;">relative QPS is </span><span style="background-color: #f4cccc;">0.81</span> in 8.0.36</li><li>l.x - relative QPS is <span style="background-color: #eeeeee;">1.00</span> in 8.0.36</li><li>l.i1, l.i2 - <span style="background-color: white;">relative QPS is </span><span style="background-color: #eeeeee;">0.98,</span><span style="background-color: white;"> </span><span style="background-color: #f4cccc;">0.93</span><span style="background-color: white;"> in 8.0.36</span></li><li>qr100, qr500, qr1000 - <span style="background-color: white;">relative QPS is </span><span style="background-color: #eeeeee;">1.01</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97,</span><span style="background-color: white;"> </span><span style="background-color: #f4cccc;">0.93</span><span style="background-color: white;"> in 8.0.36</span></li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.98</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.00</span><span style="background-color: white;"> and </span><span style="background-color: #eeeeee;">1.01</span> in 8.0.36</li></ul></ul><div>From the summary <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.80/all.html#summary">for 8.0</a> but focusing on the 8.0.35 variations that disable the perf schema</div><div></div><div><ul style="text-align: left;"><li>Throughput for write-heavy steps (l.i0, l.i1, l.i2) is ~5% better</li><li>Throughput for parallel index create is ~1.5X better (<a href="https://smalldatum.blogspot.com/2023/12/create-innodb-indexes-2x-faster-with.html">read this</a>)</li><li>For read-write benchmark steps</li><ul><li>Throughput for range queries (qr*) is ~15% better</li><li>Throughput for point queries (qp*) is unchanged</li><li>The point query benchmark steps are a lot more IO-bound than the range query steps which might explain why the perf schema cost is larger for range queries here. See the rpq <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.80/all.html#qr100.L1.metrics">column here</a> which is iostat reads per query and it is less than 0.2 for qr100 but larger than 9 for qp100. From the cpupq <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.80/all.html#qr100.L1.metrics">column here</a> that measures CPU per query the perf schema increases that by up to 10% for point queries.</li></ul><li>To reduce perf schema overhead it is better to disable it at compile time than via my.cnf</li></ul></div><div><div>From the summary for <a href="https://mdcallag.github.io/reports/24_o2_01.1u.1tno.io.bee.my.all/all.html#summary">5.6, 5.7, 8.0</a></div><div><ul style="text-align: left;"><li>The base case is 5.6.21</li><li>The regressions here are smaller than for a <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-innodbmysql-56.html">cached workload</a> because the workload here is frequently IO-bound and the extra IO latency often dominates the extra CPU latency that arrives in modern MySQL.</li><li>Comparing 5.7.44 and 8.0.36 with 5.6.21</li><ul><li>l.i0</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.80</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.54</span> in 8.0.36</li></ul><li>l.x</li><ul><li>relative QPS is <span style="background-color: #d9ead3;">1.36</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #d9ead3;">1.30</span> in 8.0.36</li></ul><li>l.i1, l.i2</li><ul><li>relative QPS is <span style="background-color: #d9ead3;">1.30</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.25</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #d9ead3;">1.29</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.16</span> in 8.0.36</li></ul><li>qr100, qr500, qr1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.73</span>, <span style="background-color: #f4cccc;">0.83</span>, <span style="background-color: #f4cccc;">0.92</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.68</span>, <span style="background-color: #f4cccc;">0.77</span>, <span style="background-color: #f4cccc;">0.85</span> in 8.0.36</li></ul><li>qp100, qp500, qp1000</li><ul><li>relative QPS is <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">1.02</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.91</span>, <span style="background-color: #f4cccc;">0.93</span>, <span style="background-color: #eeeeee;">1.00</span> in 8.0.36</li></ul></ul></ul></div></div></div></div></div></div></div></div></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-70837857561990203082024-02-01T10:34:00.000-08:002024-02-01T11:11:37.135-08:00Updated Insert benchmark: Postgres 9.x to 16.x, large server, cached database<p>This has results for Postgres vs the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">updated Insert Benchmark</a> with a cached workload and 24-core server with results from Postgres versions 9.0 through 16.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Postgres does a great job at avoiding regressions over time</li><li>Postgres 16.1 is a lot faster than 9.0.23, between ~1.2X and ~10X depending on the workload</li></ul><p></p><p><b>Build + Configuration</b></p><div><div>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">previous report</a> for more details. I used these versions: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 15.5, 16.1. </div><div><br />The configuration files are in subdirectories <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804">from here</a>. Search for files named <i>conf.diff.cx9a2_c24r64</i> which exist for each major version of Postgres<i>.</i></div></div><div><i><br /></i></div><div><div><b>The Benchmark</b></div><div><br /></div><div>The benchmark is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> and is run with 16 clients.</div><div><br /></div><div>The test server is <a href="https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">described here</a>. It is a SuperMicro SuperWorkstation 7049A-T with 2 sockets, 24 cores/socket, hyperthreading disabled, 64G RAM and an NVMe SSD. It runs Ubuntu 22.04 and the database filesystem uses XFS with discard enabled.</div><div><br />The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 16M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and 4M rows total</li><li>Waiting, vacuum and checkpoint are done at the end of this test step to reduce variance in the steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance report <a href="https://mdcallag.github.io/reports/24_02_01.16u.1tno.socket2.mem.pg/all.html">is here</a>.</div><div><br /></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested per benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the insert rate for the read-write benchmark steps that have background inserts and deletes and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_02_01.16u.1tno.socket2.mem.pg/all.html#summary">the summary</a>:</div></div><div><ul><li>The base case is <span style="text-align: right;">pg9023_def which means Postgres 9.0.23</span></li><li><span style="text-align: right;">For most of the read-write benchmark steps throughput improves a lot from 9.1.24 to 9.2.24 and has been stable since then. The exception is the last step (qp1000) for which throughput is flat. It might be that writeback and/or vacuum hurts query throughput by that point.</span></li><li><span style="text-align: right;">For the write-heavy steps (l.i0, l.x, l.i1, l.i2) throughput improves a lot</span></li><ul><li><span style="text-align: right;">l.i0 - things get a lot better in Postgres 9.4.26</span></li><li><span style="text-align: right;">l.x - things get worse from 9.3.25 through 10.23 and then improve with 11.22</span></li><li><span style="text-align: right;">l.i1 - things get a lot better in Postgres 9.5.25 and then again in 12.17</span></li><li><span style="text-align: right;">l.i2 - things get better in 9.5, worse in 9.6 through 11, better in 12 and then are stable. I assume most of the changes are from problems and improvements related to query planner CPU overhead during DELETE statements (<a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_10.html">see the comments</a> about get_actual_variable_range)</span></li></ul><li>Comparing throughput in Postgres 16.1 to 9.0.23</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">3.12</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.17</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">10.42</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.14</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.23</span>, <span style="background-color: #d9ead3;">1.34</span>, <span style="background-color: #d9ead3;">1.49</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.25</span>, <span style="background-color: #d9ead3;">1.29</span>, <span style="background-color: #d9ead3;">1.46</span></li></ul></ul></ul><div><br /></div></div></div></div></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-66585183690443484822024-01-28T11:51:00.000-08:002024-01-28T15:52:11.701-08:00Explaining a performance regression in Postgres 14<p>I am trying to explain a performance regression that arrives in Postgres 14 during the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a>.</p><p>The primary problem appears to be more CPU used by the query planner for DELETE statements when the predicates in the WHERE clause have constants that fall into either the max or min histogram bucket for a given column. An example is a DELETE statement like the following and <i>transactionid</i> is the primary key so there is an index on it.<br /></p><blockquote><span style="font-family: inherit;">delete from t1 where (transactionid>=100 and transactionid<110)</span></blockquote><p></p><p>The table is used like a queue -- inserts are done in increasing order with respect to <i>transactionid</i> and when N rows are inserted, then N more rows are deleted to keep the size of the table constant. The rows to be deleted are the N rows with the smallest value for <i>transactionid</i>.<br /><br />The problem is worse for IO-bound workloads (<a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_27.html">see here</a>) than for cached workloads (<a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_10.html">see here</a>) probably because the extra work done by the query planner involves accessing the index and possibly reading data from storage.<br /><br />It is always possible I am doing something wrong but I suspect there is a fixable performance regression in Postgres 14 for this workload. The workload is <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">explained here</a> and note that <i>vacuum (analyze)</i> is done between the write-heavy and read-heavy benchmark steps.</p><p>There are three issues:</p><p></p><ol style="text-align: left;"><li>There is only one good index for the DELETE statement, yet the query planner does (too much) work to figure out the selectivity for that index.</li><li>When the constants used in WHERE clause predicates fall into the largest or smallest histogram bucket for a column, then the query planner reads from the index to figure out the real min or max value in the index. The code for this is in the function get_actual_variable_range.</li><li>Extra work is done while reading from the index because there are too many entries that can be but have yet to be removed by vacuum. So the index scan encounters and then skips them for a while until it reaches a visible entry.</li></ol><p></p><p>Issue #3 is made worse by the workload. The table is used like a queue. There is a sequence for the PK column, inserts are in ascending order getting new values for the PK column from a sequence. Deletes are done to the other end of the table -- each delete statement deletes the N rows with the smallest value for the PK column. Similar problems can occur with InnoDB and MyRocks -- I know from experience.<br /><br />I suspect the solution in this case is to not try as hard to figure out selectivity when there is only one good index (fix issue #1). Although it might help to do something about issue #2 as well.</p><p><b>Request 1</b></p><p>Can the query planner to do less work when there is only one index that should be used? The full DDL for the table <a href="https://gist.github.com/mdcallag/bb2b01d4c52a7a25929125f48aa22644">is here</a>.</p><p>An abbreviated version of the DDL is below and the PK is on transactionid which uses a sequence.</p><div style="text-align: left;"><span style="font-family: courier;"> Column | Type |<br />----------------+-----------------------------+<br /> transactionid | bigint |<br /> dateandtime | timestamp without time zone |<br /> cashregisterid | integer |<br /> customerid | integer |<br /> productid | integer |<br /> price | integer |<br /> data | character varying(4000) |<br />Indexes:<br /> "pi1_pkey" PRIMARY KEY, btree (transactionid)<br /> "pi1_marketsegment" btree (productid, customerid, price)<br /> "pi1_pdc" btree (price, dateandtime, customerid)<br /> "pi1_registersegment" btree (cashregisterid, customerid, price)<br />Access method: heap</span></div><p>For a DELETE statement like the following, the only efficient index is pi1_pkey. So I prefer that the query planner do less work to figure that out.</p><p></p><blockquote>delete from t1 where (transactionid>=100 and transactionid<110)</blockquote><p></p><p><b>CPU overhead</b></p><p>When I run the Insert Benchmark there are 6 read-write benchmark steps -- 3 that do range queries as fast as possible, 3 that do point queries as fast as possible. For all of them there are also inserts and deletes done concurrent with the range queries and they are rate limited -- first at 100 inserts/s and 100 deletes/s, then at 500 inserts/s and 500 deletes/s and finally at 1000 inserts/s and 1000 deletes/s. So the work for writes (inserts & deletes) is fixed per benchmark step while the work done by queries is not. Also, for each benchmark step there are three connections -- one for queries, one for inserts, one for deletes. <br /><br />Using separate connections makes it easier to spot changes in CPU overhead and below I show the number of CPU seconds for the range query benchmark steps (qr100, qr500, qr1000) where the number indicates the write (insert & delete) rate. Results are provided for Postgres 13.13 and 14.10 from the benchmark I <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_27.html">described here</a> (small server, IO-bound).</p><p>From below I see two problems. First, the CPU overhead for the delete connection is much larger with Postgres 14.10 for all benchmark steps (qr100, qr500, qr1000). Second, the CPU overhead for the query connection is much larger with Postgres 14.10 for qr1000, the benchmark step with the largest write rate.</p><div style="text-align: left;"><span style="font-family: courier;">Legend<br />* ins = connection that does inserts<br />* del = connection that does deletes<br />* query = connection that does range queries</span></div><div style="text-align: left;"><span style="font-family: courier;"><br />CPU seconds with 100 inserts/s, 100 deletes/s -> qr100<br /> ins del query<br />13.13 5 14 1121<br />14.10 15 187 1148<br /><br /></span></div><div style="text-align: left;"><span style="font-family: courier;">CPU seconds with 500 inserts/s, 500 deletes/s -> qr500<br /> ins del query<br />13.13 71 71 1128<br />14.10 73 1050 1144<br /><br /></span></div><div style="text-align: left;"><span style="font-family: courier;">CPU seconds with 1000 inserts/s, 1000 deletes/s -> qr1000<br /> ins del query<br />13.13 135 1113 1129<br />14.10 151 2912 1906</span></div><p style="text-align: left;"><b>Debugging after the fact: CPU profiling</b></p><p style="text-align: left;">I repeated the benchmark for Postgres 13.13 and 14.10 and after it finished repeated the qr100 benchmark step a few times for each of Postgres 13.13 and 14.10. The things that I measure here don't match exactly what happens during the benchmark because the database might be in a better state with respect to write back and vacuum.</p><p style="text-align: left;">While this is far from scientific, I used explain analyze on a few DELETE statements some time after they were used. The results <a href="https://gist.github.com/mdcallag/2fcd0ce762592420e8e49fe1dfd5696f">are here</a>. I repeated the statement twice for each Postgres release and the planning time for the first explain is 49.985ms for Postgres 13.13 vs 100.660ms for Postgres 14.10.<br /><br />So I assume the problem is the CPU overhead from the planner and not from executing the statement.</p><p style="text-align: left;">Then I looked at the CPU seconds used by the connection that does deletes after running for 10 minutes and it was ~50s for Postgres 13.13 vs ~71s for 14.10. So the difference at this point is large, but much smaller than what I report above which means the things I want to spot via CPU profiling might be harder to spot. Also, if the problem is IO latency rather than CPU overhead then CPU profiling won't be as useful.<br /><br /><a href="https://gist.github.com/mdcallag/1ee2b5972732efa6f588db82ae100dd4">This gist</a> has the top-5 call stacks from hierarchical profiling with perf for the connection that does deletes. While there isn't an obvious difference between Postgres 13.13 and 14.10 there is something I don't like -- all stacks are from the query planner and include the function get_actual_variable_range.</p><p style="text-align: left;"><b>IO profiling</b></p><p style="text-align: left;">It looks like the query planner does more read IO for delete statements in Postgres 14.10 than in 13.13.</p><p style="text-align: left;">From the full benchmark I see the following for the range query benchmark steps which means there is more read IO (see rps column) with Postgres 14.10 for the qr100 and qr500 benchmark steps but not with the qr1000 benchmark step. And in call cases the range query rate (see qps column) is significantly less with Postgres 14.10.</p><div style="text-align: left;"><span style="font-family: courier;">Legend:<br />* qps = range queries/s<br />* rps = read IO requests/s per iostat</span></div><div style="text-align: left;"><span style="font-family: courier;"><br /> qr100<br />version qps rps<br />13.13 8338.2 166.5<br />14.10 5822.6 183.5</span></div><div style="text-align: left;"><span style="font-family: courier;"><br /> qr500<br />version qps rps<br />13.13 8101.7 615.6<br />14.10 5917.9 885.6</span></div><div style="text-align: left;"><span style="font-family: courier;"><br /> qr1000<br />version qps rps<br />13.13 7090.1 1682.9<br />14.10 5139.0 1036.2</span></div><p style="text-align: left;"><br /></p>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-84450785303131904952024-01-27T09:59:00.000-08:002024-01-27T09:59:25.008-08:00Updated Insert benchmark: Postgres 9.x to 16.x, small server, IO-bound database<p>This has results for Postgres vs the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> on a small server with an IO-bound workload. I include results for the latest point release from all major versions from 9.0 to 16.</p><div>tl;dr</div><div><ul style="text-align: left;"><li>While there are no regressions in the CPU-bound (<a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_24.html">cached</a>) workload there are regressions here</li><li>There are two changes related to get_actual_variable_range and get_actual_variable_endpoint (a <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_10.html">previous post</a> also explained this). Note some parts of this workload are not typical and regressions I find here aren't relevant to many other workloads.</li><ul><li>Starting in Postgres 12 the throughput for l.i1 and l.i2 improves by ~2X because the CPU overhead from the query planner during DELETE statements has been reduced.</li><li>Starting in Postgres 14 the throughput for range queries decreases by ~30% because the CPU overhead for range queries and DELETE statements has grown. I am still debugging this.</li></ul><li>Most versions were unable to sustain the target write rates (1000 inserts/s and 1000 delete/s) during the qr1000 and qp1000 benchmark steps. Only Postgres 12.17 and 13.13 were able to sustain it, most others were far from the target and the worst were Postgres 14.10, 15.5 and 16.1.</li><li>Something changed for the worse in Postgres 14 that increases CPU overhead for queries and DELETE statements in this workload.</li></ul>Comparing throughput in Postgres 16.1 to 9.0.23<br /><ul style="text-align: left;"><li>Write-heavy - Postgres 16.1 is between 1.2X and 2.3X faster than 9.0.23</li><li><span style="background-color: white;">Range queries - Postgres 16.1 is up to ~20% slower than 9.0.23</span></li><li><span style="background-color: white;">Point queries - Postgres 16.1 is similar to 9.0.23</span></li></ul><div><ul></ul></div><div><p><b>Build + Configuration</b></p><div><div>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">previous report</a> for more details. I tested these versions: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 15.5, 16.1. </div><div><br />The configuration files are in subdirectories <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804">from here</a>. Search for files named <i>conf.diff.cx9a2_bee</i> which exist for each major version of Postgres<i>.</i></div><div><br /></div><div><b>The Benchmark</b></div><div><br /></div><div>The test server is a Beelink SER4 with 8 AMD cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device.</div><div><br />The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert 800 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 4M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and inserts 1M rows total</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance report <a href="https://mdcallag.github.io/reports/24_01_27.1u.1tno.io.bee.pg/all.html">is here</a>.</div><div><br /></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_01_27.1u.1tno.io.bee.pg/all.html#summary">the summary</a>:</div></div><div><ul><li>The base case is <span style="text-align: right;">pg9023_def which means Postgres 9.0.23</span></li><li><span style="text-align: right;">For the read-heavy benchmark steps that do range queries (qr100, qr500, qr1000) throughput improved between Postgres 9.2 and 13 and then it drops by ~30% in Postgres 14.10 and I confirmed the drop is also in Postgres 14.0. I will start to explain this in another post.</span></li><li><span style="text-align: right;">For the read-heavy benchmark steps that do point queries (qp100, qp500, qp1000) thoughput is mostly unchanged from 9.0.23 through 16.1.</span></li><li><span style="text-align: right;">For the write-heavy steps (l.i0, l.x, l.i1, l.i2) throughput improves a lot</span></li><ul><li><span style="text-align: right;">l.i0 - things get a lot better in Postgres 11.22</span></li><li><span style="text-align: right;">l.x - things get a lot better between Postgres 9.4.26 and 11.22</span></li><li><span style="text-align: right;">l.i1, l.i2 - things get a lot better in Postgres 12.17 likely because the query planner</span> overhead during DELETE statements has been reduced (<a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_10.html" style="text-align: right;">see the comments</a><span style="text-align: right;"> </span><span style="text-align: right;">about get_actual_variable_range)</span></li></ul><li>Comparing throughput in Postgres 16.1 to 9.0.23</li><ul><li>Write-heavy -- Postgres 16 is faster</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.22</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.32</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.83</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.87</span></li></ul><li><span style="background-color: white;">Range queries -- Postgres 16 is mostly slower</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.81</span>, <span style="background-color: #f4cccc;">0.89</span>, <span style="background-color: #eeeeee;">1.01</span></li></ul><li><span style="background-color: white;">Point queries -- Postgres 16 is slightly slower</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">1.00</span></li></ul></ul></ul><div><b>Target write rates</b></div><div><b><br /></b></div><div>The third table in the summary shows the write rates sustained during the read-write benchmark steps. The target write rates are 100/s for qr100 and qp100, 500/s for qr500 and qp500 and then 1000/s for qr1000 and qp1000. Note that X/s means X inserts/s and X delete/s. When the value is close enough to the target then I assume the target has been sustained. The table cells in red indicate the cases where the target has not been sustained.</div><div><ul style="text-align: left;"><li>For qr100, qp100, qr500, qp500 -- all versions sustained the targets</li><li>For qr1000, qp1000 - only Postgres 12.17 and 13.13 sustained the targets.</li></ul><div>One session is used for INSERT statements and another for DELETE statements. They run at the same rate so if one session runs slow, then both will be slow. I assume the problem here is that DELETE processing is slow and this is related to changes in get_actual_variable_range.</div><div><br /></div><div>The following table show the number of CPU seconds consumed per connection during the qp1000 benchmark step. There is:</div><div><ul style="text-align: left;"><li>a big increase in CPU starting in 12.17 for the query connection</li><li>a big decrease in CPU starting in 12.17 for the delete connection</li><li>a big increase in CPU starting in 14.10 for the query and delete connection</li></ul></div><div><span style="font-family: courier;">CPU seconds per connection during qp1000</span></div><div><div><span style="font-family: courier;">* query = connection that does point queries</span></div><div><span style="font-family: courier;">* ins = connection that does inserts</span></div><div><span style="font-family: courier;">* del = connection that does deletes</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;"> query ins del</span></div><div><span style="font-family: courier;">11.22 626 157 3657</span></div><div><span style="font-family: courier;">12.17 311 144 1671</span></div><div><span style="font-family: courier;">13.13 312 145 1758</span></div><div><span style="font-family: courier;">14.10 595 158 3596</span></div><div><span style="font-family: courier;">15.5 609 156 3714</span></div><div><span style="font-family: courier;">16.1 612 158 3716</span></div></div><div><br /></div></div><div><br /></div><div><br /></div><div><br /></div></div></div></div></div></div></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com2tag:blogger.com,1999:blog-9149523927864751087.post-56975297866485725402024-01-25T11:37:00.000-08:002024-01-25T11:37:41.628-08:00Updated Insert benchmark: InnoDB/MySQL 5.6, 5.7 and 8.0, small server, cached database<p>I now have 4 server types at home (8 cores + 16G RAM, 8 cores + 32G RAM, 24 cores, 32 cores) and am trying to finish a round of the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> for each. This has results for the smallest (8 cores + 16G RAM) using a cached workload and MySQL 5.6, 5.7, 8.0.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>For this setup MySQL has large regressions over time while Postgres <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_24.html">does not</a></li><li>The regressions in MySQL are large here, but smaller on workloads with more concurrency</li><li>There are few regressions with the 5.6 and 5.7 release cycles</li><li>There are large regressions within the 8.0 release cycle</li><li>There are large regressions at the start of the 5.7 and 8.0 release cycles</li><li>Enabling the perf schema reduces throughput by ~4% for most write heavy benchmark steps, by ~10% for read heavy benchmark steps and a lot more for index create</li></ul><p></p><div><b>Build + Configuration</b></div><div><div><div><br /></div><div>I tested many versions of MySQL 5.6, 5.7 and 8.0 These were compiled from source. I used the CMake files <a href="https://github.com/mdcallag/mytools/tree/master/bench/build/dec23.cmk.patch.mysql">from here</a> with the <a href="https://github.com/mdcallag/mytools/tree/master/bench/build/dec23.cmk.patch.mysql">patches here</a> to fix problems that otherwise prevent compiling older MySQL releases on modern Ubuntu. In all cases I use the <i><b>rel</b></i> build that uses CMAKE_BUILD_TYPE =Release.<br /><br />I used the cz10a_bee my.cnf files that are here <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/my56/my.cnf.cz10a_bee">for 5.6</a>, <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804/my57">for 5.7</a> and <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804/my80/etc">for 8.0</a>. For 5.7 and 8.0 there are many variants of that file to make them work on a range of the point releases.</div></div><div><br /></div><div>The versions I tested are:</div><div><ul><li>5.6</li><ul><li>5.6.21, 5.6.31, 5.6.41, 5.6.51</li></ul><li>5.7</li><ul><li>5.7.10, 5.7.20, 5.7.30, 5.7.44</li></ul><li>8.0</li><ul><li>8.0.13, 8.0.14, 8.0.20, 8.0.28, 8.0.35, 8.0.36</li></ul></ul><div>For 8.0.35 I tested a few variations from what is described above to understand the cost of the performance schema:</div></div><div><ul><li><span style="text-align: right;">my8035_rel.cz10aps0_bee</span></li><ul><li><span style="text-align: right;">this uses my.cnf.cz10aps0_bee which is the same as my.cnf.cz10a_bee except it adds performance_schema =0</span></li></ul><li><span style="text-align: right;">my8035_rel_lessps.cz10a_bee</span></li><ul><li><span style="text-align: right;">the build disables as much as possible of the performance schema. The CMake file <a href="https://github.com/mdcallag/mytools/blob/master/bench/build/dec23.cmk.patch.mysql/mysql-8.0.35/cmk.80.rel_lessps">is here</a>.</span></li></ul></ul><div style="text-align: right;"><div style="text-align: left;"><b>Benchmark</b></div><div style="text-align: left;"><div><br /></div><div>The test server is a Beelink SER4 with 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device. The benchmark is run with one client.</div><div><br /></div><div>I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:</div><p></p><div><ul><li>l.i0</li><ul><li>insert 30 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>lik qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.56/all.html">MySQL 5.6</a>, <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.57/all.html">MySQL 5.7</a>, <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.80/all.html">MySQL 8.0</a> and <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.all/all.html">MySQL 5.6 to 8.0</a>.<br /><br /></div><div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div></div></div><div><br /></div><div>From the summary <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.56/all.html#summary">for 5.6</a></div><div><ul><li>The base case is 5.6.21</li><li>Throughput in 5.6.51 is ~2% less than 5.6.21</li></ul></div><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.57/all.html#summary">for 5.7</a></div><div><ul><li>The base case is 5.7.10</li><li>Throughput in 5.7.44 is ~3% less than 5.7.10</li></ul></div></div><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.80/all.html#summary">for 8.0</a></div><div><ul style="text-align: left;"><li>The base case is 8.0.13</li><li>I ignore the 8.0.35 variations (cz10aps0_bee config, rel_lessps build) for now</li><li>Unlike MySQL 5.6 and 5.7 above, there are larger regressions during the 8.0 cycle. Comparing 8.0.36 with 8.0.13</li><ul><li>l.i0 - <span style="background-color: white;">relative QPS is </span><span style="background-color: #f4cccc;">0.81</span> in 8.0.36</li><li>l.x (create index) - I ignore this for now but but <a href="https://smalldatum.blogspot.com/2023/12/create-innodb-indexes-2x-faster-with.html">read this</a></li><li>l.i1, l.i2 - <span style="background-color: white;">relative QPS is </span><span style="background-color: #f4cccc;">0.91</span><span style="background-color: white;"> and </span><span style="background-color: #f4cccc;">0.80</span><span style="background-color: white;"> in 8.0.36</span></li><li>qr100, qr500, qr1000 - <span style="background-color: white;">relative QPS is </span><span style="background-color: #eeeeee;">0.97</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.96</span><span style="background-color: white;"> and </span><span style="background-color: #f4cccc;">0.94</span><span style="background-color: white;"> in 8.0.36</span></li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.86</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.88</span><span style="background-color: white;"> and </span><span style="background-color: #f4cccc;">0.84</span> in 8.0.36</li></ul></ul><div>From the summary <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.80/all.html#summary">for 8.0</a> focusing on the 8.0.35 variations that disable the perf schema</div><div><ul><li>Throughput for write-heavy steps (l.i0, l.i1, l.i2) is up to 4% better</li><li>Throughput for read-heavy steps (qr*, qp*) is ~11% better</li><li>Throughput for parallel index create is ~1.5X better (<a href="https://smalldatum.blogspot.com/2023/12/create-innodb-indexes-2x-faster-with.html">read this</a>)</li></ul></div><div><div>From the summary for <a href="https://mdcallag.github.io/reports/24_01_25.8u.1tno.mem.bee.my.all/all.html#summary">5.6, 5.7, 8.0</a></div><div><ul style="text-align: left;"><li>The base case is 5.6.21</li><li>Comparing 5.7.44 and 8.0.36 with 5.6.21 shows the large regressions</li><ul><li>l.i0</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.81</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.55</span> in 8.0.36</li></ul><li>l.x - I ignore this for now</li><li>l.i1, l.i2</li><ul><li>relative QPS is <span style="background-color: #d9ead3;">1.10</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.86</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.91</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.71</span> in 8.0.36</li></ul><li>qr100, qr500, qr1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.73</span>, <span style="background-color: #f4cccc;">0.72</span>, <span style="background-color: #f4cccc;">0.72</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.62</span>, <span style="background-color: #f4cccc;">0.63</span>, <span style="background-color: #f4cccc;">0.62</span> in 8.0.36</li></ul><li>qp100, qp500, qp1000</li><ul><li>relative QPS is <span style="background-color: #f4cccc;">0.81</span>, <span style="background-color: #f4cccc;">0.80</span>, <span style="background-color: #f4cccc;">0.80</span> in 5.7.44</li><li>relative QPS is <span style="background-color: #f4cccc;">0.60</span>, <span style="background-color: #f4cccc;">0.61</span>, <span style="background-color: #f4cccc;">0.61</span> in 8.0.36</li></ul></ul></ul></div></div></div></div></div></div></div></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-70868477042922285582024-01-24T12:07:00.000-08:002024-03-17T18:30:45.332-07:00Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v3<p>I now have 4 server types at home (8 cores + 16G RAM, 8 cores + 32G RAM, 24 cores, 32 cores) and am trying to finish a round of the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> for each. This has results for the smallest (8 cores + 16G RAM) using a cached workload and Postgres.<br /><br />In <a href="https://smalldatum.blogspot.com/2023/12/perf-regressions-in-mysql-from-56-to-80.html">previous blog posts</a> I claimed that there are large regressions from old to new MySQL but not from old to new Postgres. And I shared results for MySQL 5.6, 5.7 and 8.0 along with Postgres versions 10 through 16. A comment about these results is the comparison was unfair because the first GA MySQL 5.6 release is 5.6.10 from 2013 while the first Postgres 10 GA release is 10.0 from 2017.<br /><br />Here I have results going back to Postgres 9.0.23 and the first 9.0 release is 9.0.0 from 2010.<br /><br />tl;dr</p><p></p><ul style="text-align: left;"><li>the song remains the same: MySQL has large regressions over time while Postgres avoids them</li><li>comparing Postgres 16.1 with Postgres 9.0.23</li><ul><li>for write-heavy benchmark steps PG 16.1 gets between 1.2X and 2.8X more throughput</li><li>for range queries PG 16.1 gets ~1.2X more throughput</li><li>for point queries PG 16.1 gets ~1.1X more throughput</li></ul></ul><p></p><p><b>Build + Configuration</b></p><div><div>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">previous report</a> for more details. I used these versions: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 15.5, 16.1. </div><div><br />The configuration files are in subdirectories <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/nuc8i7.ub1804">from here</a>. Search for files named <i>conf.diff.cx9a2_bee</i> which exist for each major version of Postgres<i>.</i></div><div><br /></div><div><b>The Benchmark</b></div><div><br /></div><div>The benchmark is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> except the first benchmark step, l.i0, loads 30M rows/table here while previously it only loaded 20M. The database still fits in memory as the test server has 16G of RAM and the database tables are ~8G. The benchmark is run with 1 client.</div><div><br /></div><div>The test server was named SER4 in the previous report. It has 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device.</div><div><br />The benchmark steps are:<div><p></p><div><ul><li>l.i0</li><ul><li>insert 30 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 40M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions) and 10M rows total</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance report <a href="https://mdcallag.github.io/reports/24_01_24.8u.1tno.mem.bee.pg/all.html">is here</a>.</div><div><br /></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_01_24.8u.1tno.mem.bee.pg/all.html#summary">the summary</a>:</div></div><div><ul style="text-align: left;"><li>The base case is <span style="text-align: right;">pg9023_def which means Postgres 9.0.23</span></li><li><span style="text-align: right;">For most of the read-write benchmark steps throughput improves a lot from 9.1.24 to 9.2.24 and has been stable since then. The exception is the last step (qp1000) for which throughput is flat. It might be that writeback and/or vacuum hurts query throughput by that point.</span></li><li><span style="text-align: right;">For the write-heavy steps (l.i0, l.x, l.i1, l.i2) throughput improves a lot</span></li><ul><li><span style="text-align: right;">l.i0 - things get a lot better in Postgres 11.22</span></li><li><span style="text-align: right;">l.x - things get a lot better in Postgres 9.6.24</span></li><li><span style="text-align: right;">l.i1 - things get a lot better in Postgres 9.5.25 and then again in 12.17</span></li><li><span style="text-align: right;">l.i2 - improvements are similar to l.i1 but not as good because of the query planner overhead during DELETE statements (<a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to_10.html">see the comments</a> about get_actual_variable_range)</span></li></ul><li>Comparing throughput in Postgres 16.1 to 9.0.23</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.23</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">1.81</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.82</span><span style="background-color: white;">, </span><span style="background-color: #d9ead3;">2.69</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.20</span>, <span style="background-color: #d9ead3;">1.24</span>, <span style="background-color: #d9ead3;">1.25</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.10</span>, <span style="background-color: #d9ead3;">1.09</span>, <span style="background-color: #eeeeee;">1.00</span></li></ul></ul></ul><div><br /></div></div></div></div></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-53666541411128457222024-01-17T13:03:00.000-08:002024-01-17T13:03:46.779-08:00Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, IO-bound database, v2<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> using MyRocks 5.6 and 8.0, a medium server and an IO-bound workload with a working set that isn't cached.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>The cost from enabling the perf schema was insignificant for the write-heavy and point-query benchmark steps. It was significant for the range-query benchmark steps.</li></ul><div>Comparing latest MyRocks 5.6.35 to older MyRocks 5.6.35</div><div><ul style="text-align: left;"><li>Write-heavy perf mostly improves, especially on the initial load step (l.i0)</li><li>Point-query perf is stable</li><li>Range-query perf shows a big regression between the fbmy5635_rel_202210112144 and fbmy5635_rel_202302162102 builds</li></ul><div>Comparing latest MyRocks 8.0.32 to older MyRocks 5.6.35</div></div><div><ul style="text-align: left;"><li>The cost of the perf schema is large for range queries and otherwise not large</li><li>Write-heavy perf mostly improves, especially on the initial load step (l.i0)</li><li>Point-query perf is stable</li><li>Range-query perf shows a big regression between the fbmy5635_rel_202210112144 and fbmy5635_rel_202302162102 builds and doesn't recover in the 8.0 builds</li></ul></div><div>Comparing latest MyRocks 8.0.32 to latest MyRocks 5.6.35</div><div><ul style="text-align: left;"><li>Write-heavy perf is similar except for the initial load step (l.i0) in which 8.0 is almost 20% slower</li><li>Point-query perf is similar</li><li>Range-query perf is ~5% worse in 8.0</li></ul><div>Comparing latest MyRocks 8.0.32 to latest MyRocks 8.0.28</div></div><div><ul style="text-align: left;"><li>Results are similar</li></ul></div><p></p><p><b>Build + Configuration</b></p><div><p>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and.html">previous report</a>.</p><p><b>Benchmark</b></p><p>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and.html">previous report</a>. </p><div><b>Benchmark steps</b></div><div><br /></div><div>The benchmark is run with 8 clients and a client per table.</div><div><br /></div><div>The benchmark is a sequence of steps that are run in order:</div><div><ul><li>l.i0</li><ul><li>insert 500M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One does inserts as fast as possible and the other does deletes at the same rate as the inserts to avoid changing the number of rows in the table. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow where X is max(1200, 60 + #nrows/1M). While waiting do things to reduce writeback debt where the things are:</li><ul><li>MyRocks (<a href="https://github.com/mdcallag/mytools/blob/dd901e3ef42ae8f0104830b1cac3fd778980508b/bench/ibench/iq.sh#L132">see here</a>) - set rocksdb_force_flush_memtable_now to flush the memtable, wait 20 seconds and then set rocksdb_compact_lzero_now to flush L0. Note that rocksdb_compact_lzero_now wasn't supported until mid-2023.</li></ul></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries as fast as possible and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for 1800 seconds. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>lik qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul></div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for</div><div><ul><li><a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.56/all.html">MyRocks 5.6</a> </li><li><a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.80/all.html">MyRocks 8.0</a></li><li><a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.all/all.html">MyRocks 5.6 & 8.0</a> with many 5.6 versions</li><li><a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.latest/all.html">MyRocks 5.6 & 8.0</a> with the latest versions</li></ul></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.56/all.html#summary">the summary</a> for 5.6</div></div><div><ul><li>The base case is fbmy5635_rel_202104072149</li><li>Comparing throughput in fbmy5635_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.11</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.92</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.00</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.00</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.62</span>, <span style="background-color: #f4cccc;">0.79</span>, <span style="background-color: #f4cccc;">0.77</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.97</span>, <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">0.99</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.80/all.html#summary">the summary</a> for 8.0</div></div><div><ul><li>The base case is fbmy8028_rel_221222</li><li>The cost of the perf schema is <= 2% for write-heavy, <= 19% for range queries and <= 1% for point queries. I am not certain that the impact on range queries is all from the perf schema. I still need to explain why the range query benchmark steps have too much noise.</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #eeeeee;">0.96</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.98</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.99</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #eeeeee;">1.02</span>, <span style="background-color: #eeeeee;">1.04</span>, <span style="background-color: #eeeeee;">1.04</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.99</span>, <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">0.98</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.all/all.html#summary">the summary</a> for 5.6, 8.0 with many versions</div></div><div><ul><li>The base case is fbmy5635_rel_202104072149</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.91</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.87</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.99</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.58</span>, <span style="background-color: #f4cccc;">0.76</span>, <span style="background-color: #f4cccc;">0.74</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">1.03</span>, <span style="background-color: #eeeeee;">1.01</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_17.8u.1tno.io.c2.fbmy.latest/all.html#summary">the summary</a> for 5.6, 8.0 with latest versions</div></div><div><ul><li>The base case is fbmy5635_rel_221222</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.82</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.95</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.98</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.95</span>, <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #eeeeee;">0.96</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">1.01</span>, <span style="background-color: #eeeeee;">1.02</span>, <span style="background-color: #eeeeee;">1.02</span></li></ul></ul></ul></div></div><div><br /></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-83243326124576238022024-01-12T11:39:00.000-08:002024-01-24T11:34:02.307-08:00Updated Insert benchmark: MyRocks 5.6 and 8.0, small(est) server, cached database, v2<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> using MyRocks 5.6 and 8.0, a small server and a cached workload. I have two versions of small servers -- Beelink SER4 with 16G of RAM, Beelink SER7 with 32G of RAM. This report uses the SER4. This report replaces a <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and_2.html">January 2</a> report for the Beelink SER4. The difference is that I improved the benchmark scripts to reduce compaction debt prior to the read-write benchmark steps. My intention was to reduce noise in the throughput results. Alas, I have more work to do.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Enabling the perf schema reduces throughput by up to 10% for write-heavy and up to 5% for read-heavy.</li><li>The range query benchmark steps (qr*) have too much noise that I have yet to explain</li><li>Comparing latest MyRocks 8.0.32 to 5.6.35 shows</li><ul><li>8.0.32 gets 20% to 30% less throughput for write-heavy</li><li>8.0.32 gets ~10% less throughput for point queries</li><li>There is too much noise on the range query benchmark steps</li></ul></ul><p></p><ul style="text-align: left;"></ul><p></p><div><b>Build + Configuration</b></div><div><b><br /></b></div><div>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and_2.html">previous report</a>.</div><div><b><br /></b></div><div><b>Benchmark</b></div><p></p><div><div>The server is a Beelink SER4 <a href="https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">described here</a> with 8 cores, 16G RAM, Ubuntu 22.04 and XFS on a fast m.2 NVMe device. The benchmark is run with 1 client.</div><div><br /></div><div>The benchmark is a sequence of steps that are run in order:</div><div><ul><li>l.i0</li><ul><li>insert 30M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One does inserts as fast as possible and the other does deletes at the same rate as the inserts to avoid changing the number of rows in the table. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow where X is max(1200, 60 + #nrows/1M). While waiting do things to reduce writeback debt where the things are:</li><ul><li>MyRocks (<a href="https://github.com/mdcallag/mytools/blob/dd901e3ef42ae8f0104830b1cac3fd778980508b/bench/ibench/iq.sh#L132">see here</a>) - set rocksdb_force_flush_memtable_now to flush the memtable, wait 20 seconds and then set rocksdb_compact_lzero_now to flush L0. Note that rocksdb_compact_lzero_now wasn't supported until mid-2023.</li></ul></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries as fast as possible and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for 1800 seconds. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul></div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for</div><div><ul><li><a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.56/all.html">MyRocks 5.6</a> </li><li><a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.80/all.html">MyRocks 8.0</a></li><li><a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.all/all.html">MyRocks 5.6 & 8.0</a> with many 5.6 versions</li><li><a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.latest/all.html">MyRocks 5.6 & 8.0</a> with the latest versions</li></ul></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>The range query benchmark steps suffer from too much noise that I have yet to explain.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.56/all.html#summary">the summary</a> for 5.6</div></div><div><ul><li>The base case is fbmy5635_rel_202104072149</li><li>The results with the builds that use clang are similar to gcc except for the l.i0 and l.ix benchmark steps. I <a href="https://github.com/llvm/llvm-project/issues/55153">opened a bug</a> against LLVM for code generation related to crc32 functions.</li><li>Comparing throughput in fbmy5635_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #eeeeee;">0.96</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.98</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.99</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.00</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.80</span>, <span style="background-color: #f4cccc;">0.86</span>, <span style="background-color: #d9ead3;">1.63</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.97</span>, <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">1.03</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.80/all.html#summary">the summary</a> for 8.0</div></div><div><ul><li>The base case is fbmy8028_rel_20220829_752</li><li>The results with clang are worse than gcc. See the previous section for details.</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #eeeeee;">0.98</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.02</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.01</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.03</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.33</span>, <span style="background-color: #f4cccc;">0.95</span>, <span style="background-color: #f4cccc;">0.94</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.97</span>, <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">0.97</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.all/all.html#summary">the summary</a> for 5.6, 8.0 with many versions</div></div><div><ul><li>The base case is fbmy5635_rel_202104072149</li><li>Enabling the perf schema costs up to 10% of throughput for write-heavy and up to 5% for read-heavy.</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.69</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.88</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.83</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.84</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.92</span>, <span style="background-color: #d9ead3;">1.03</span>, <span style="background-color: #d9ead3;">1.55</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.88</span>, <span style="background-color: #f4cccc;">0.89</span>, <span style="background-color: #f4cccc;">0.92</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_12.1u.1tno.cached.bee.30m.fbmy.latest/all.html#summary">the summary</a> for 5.6, 8.0 with latest versions</div></div><div><ul><li>The base case is fbmy5635_rel_221222</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.68</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.88</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.81</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.79</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #eeeeee;">1.02</span>, <span style="background-color: #d9ead3;">1.30</span>, <span style="background-color: #f4cccc;">0.90</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.90</span>, <span style="background-color: #f4cccc;">0.89</span>, <span style="background-color: #f4cccc;">0.89</span></li></ul></ul></ul><div><br /></div><div><br /></div><div><br /></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-67657769179907035672024-01-12T09:46:00.000-08:002024-01-12T11:24:32.187-08:00Updated Insert benchmark: MyRocks 5.6 and 8.0, small server, cached database, v2<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> using MyRocks 5.6 and 8.0, a small server and a cached workload. I have two versions of small servers -- Beelink SER4 with 16G of RAM, Beelink SER7 with 32G of RAM. This report uses the SER7. A recent report from the Beelink SER4 <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and_2.html">is here</a> but that report will be replaced in a few days.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Some of the regressions between MyRocks 5.6 and 8.0 come from upstream. Here that shows up on the l.i0, qp100, qp500 and qr1000 benchmark steps.</li><li>There is too much noise in the range query benchmark steps (qr*) that I have yet to explain</li></ul><p></p><p><b>Noise</b></p><p>I recently improved the benchmark scripts to remove writeback and compaction debt after the l.i2 benchmark step to reduce noise in the read-write steps that follow. At least for MyRocks, the range query benchmark steps (qr100, qr500, qr1000) have more noise. The worst case for noise with MyRocks is the qr100 step, and this is more obvious on a small server. </p><p>For MyRocks, the benchmark script now does the following after l.i2:</p><p></p><ul style="text-align: left;"><li>wait for X seconds where X = min(1200, 60 + #rows / 1M)</li><li>while waiting: flush memtable, wait 20 seconds, compact L0 into L1. But compacting L0 into L1 is only done for MyRocks builds from mid-2023 or newer because the feature I used for that was buggy prior to mid-2023.</li></ul><div>When the qr100 benchmark step starts the memtable is empty and the L0 might be empty. On small servers when I run the benchmark step for less than one hour the memtable never gets full and there are no memtable flushes. On larger servers the memtable is likely to be flushed many times.</div><div><br /></div><div>Regardless, I have yet to figure out why there is more noise with MyRocks on the range query benchmark steps. Until then, with MyRocks I focus on qr500 and qr1000 or on the results from larger servers in my search for regressions in range queries. What I see now is that the CPU/query overhead changes significantly, but I need to explain why that happens.</div><p></p><p></p><p><b>Build + Configuration</b></p><div>I tested MyRocks 5.6.35, 8.0.28 and 8.0.32 using the latest code as of December 2023. I also repeated tests for older builds for MyRocks 5.6.35 and 8.0.28. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.</div><div><br />MyRocks 5.6.35 builds:</div><div><ul style="text-align: left;"><li>fbmy5635_rel_202104072149</li><ul><li>from code as of 2021-04-07 at git hash f896415f with RocksDB 6.19.0</li></ul><li>fbmy5635_rel_202203072101</li><ul><li>from code as of 2022-03-07 at git hash e7d976ee with RocksDB 6.28.2</li></ul><li>fbmy5635_rel_202205192101</li><ul><li>from code as of 2022-05-19 at git hash d503bd77 with RocksDB 7.2.2</li></ul><li>fbmy5635_rel_202208092101</li><ul><li>from code as of 2022-08-09 at git hash 877a0e58 with RocksDB 7.3.1</li></ul><li>fbmy5635_rel_202210112144</li><ul><li>from code as of 2022-10-11 at git hash c691c716 with RocksDB 7.3.1</li></ul><li>fbmy5635_rel_202302162102</li><ul><li>from code as of 2023-02-16 at git hash 21a2b0aa with RocksDB 7.10.0</li></ul><li>fbmy5635_rel_202304122154</li><ul><li>from code as of 2023-04-12 at git hash 205c31dd with RocksDB 7.10.2</li></ul><li>fbmy5635_rel_202305292102</li><ul><li>from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.2.1</li></ul><li>fbmy5635_rel_20230529_832</li><ul><li>from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.3.2</li></ul><li>fbmy5635_rel_20230529_843</li><ul><li>from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.4.3</li></ul><li>fbmy5635_rel_20230529_850</li><ul><li>from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.5.0</li></ul><li>fbmy5635_rel_221222</li><ul><li>from code as of 2023-12-22 at git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b</li></ul></ul><div>MyRocks 8.0.28 builds:</div><div><ul style="text-align: left;"><li>fbmy8028_rel_20220829_752</li><ul><li>from code as of 2022-08-29 at git hash a35c8dfeab, RocksDB 7.5.2</li></ul><li>fbmy8028_rel_20230129_754</li><ul><li>from code as of 2023-01-29 at git hash 4d3d44a0459, RocksDB 7.5.4</li></ul><li>fbmy8028_rel_20230502_810</li><ul><li>from code as of 2023-05-02 at git hash d1ca8b276d, RocksDB 8.1.0</li></ul><li>fbmy8028_rel_20230523_821</li><ul><li>from code as of 2023-05-23 at git hash b08cc536f1, RocksDB 8.2.1</li></ul><li>fbmy8028_rel_20230619_831</li><ul><li>from code as of 2023-06-19 at git hash 6164cf0274, RocksDB 8.3.1</li></ul><li>fbmy8028_rel_20230629_831</li><ul><li>from code as of 2023-06-29 at git hash ab522f6df7c, RocksDB 8.3.1</li></ul><li>fbmy8028_rel_221222</li><ul><li>from code as of 2023-12-22 at git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b</li></ul></ul><div>MyRocks 8.0.32 builds:</div><ul style="text-align: left;"><li>fbmy8032_rel_221222</li><ul><li>from code as of 2023-12-22 at git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b</li></ul></ul></div></div><p><b>Benchmark</b></p><div>The server is a Beelink SER7 <a href="https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">described here</a> with 8 cores, 32G RAM, Ubuntu 22.04 and XFS on a fast m.2 NVMe device. The benchmark is run with 1 client.</div><div><br /></div><div>The benchmark is a sequence of steps that are run in order:</div><div><ul><li>l.i0</li><ul><li>insert 60M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One does inserts as fast as possible and the other does deletes at the same rate as the inserts to avoid changing the number of rows in the table. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow where X is max(1200, 60 + #nrows/1M). While waiting do things to reduce writeback debt where the things are:</li><ul><li>MyRocks (<a href="https://github.com/mdcallag/mytools/blob/dd901e3ef42ae8f0104830b1cac3fd778980508b/bench/ibench/iq.sh#L132">see here</a>) - set rocksdb_force_flush_memtable_now to flush the memtable, wait 20 seconds and then set rocksdb_compact_lzero_now to flush L0. Note that rocksdb_compact_lzero_now wasn't supported until mid-2023.</li></ul></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries as fast as possible and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for 1800 seconds. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul></div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for</div><div><ul><li><a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.56/all.html">MyRocks 5.6</a> </li><li><a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.80/all.html">MyRocks 8.0</a></li><li><a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.all/all.html">MyRocks 5.6 & 8.0</a> with many 5.6 versions</li><li><a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.latest/all.html">MyRocks 5.6 & 8.0</a> with the latest versions</li></ul></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>The range query benchmark steps suffer from too much noise that I have yet to explain.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.56/all.html#summary">the summary</a> for 5.6</div></div><div><ul><li>The base case is fbmy5635_rel_202104072149</li><li>Comparing throughput in fbmy5635_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.95</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.98</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.95</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.65</span>, <span style="background-color: #d9ead3;">1.11</span>, <span style="background-color: #f4cccc;">0.70</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">0.99</span>, <span style="background-color: #eeeeee;">0.99</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.80/all.html#summary">the summary</a> for 8.0</div></div><div><ul><li>The base case is fbmy8028_rel_20220829_752</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.95</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.01</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.00</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #f4cccc;">0.72</span>, <span style="background-color: #eeeeee;">1.04</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.99</span>, <span style="background-color: #eeeeee;">1.00</span>, <span style="background-color: #eeeeee;">0.99</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.all/all.html#summary">the summary</a> for 5.6, 8.0 with many versions</div></div><div><ul><li>The base case is fbmy5635_rel_202104072149</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.66</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.89</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.82</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.81</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.93</span>, <span style="background-color: #d9ead3;">1.04</span>, <span style="background-color: #f4cccc;">0.69</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.86</span>, <span style="background-color: #f4cccc;">0.86</span>, <span style="background-color: #f4cccc;">0.83</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_11.1u.1tno.cached.ser7.fbmy.latest/all.html#summary">the summary</a> for 5.6, 8.0 with latest versions</div></div><div><ul><li>The base case is fbmy5635_rel_221222</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.69</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.91</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.85</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.84</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.44</span>, <span style="background-color: #f4cccc;">0.93</span>, <span style="background-color: #eeeeee;">0.98</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.86</span>, <span style="background-color: #f4cccc;">0.87</span>, <span style="background-color: #f4cccc;">0.84</span></li></ul></ul></ul><div><br /></div></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-21410653008942627922024-01-11T16:48:00.000-08:002024-01-17T13:00:25.786-08:00Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, cached database, v2<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> using MyRocks 5.6 and 8.0, a medium server and a cached workload. This replaces a <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and.html">recent report</a>. The difference between this and the recent report is that I changed the benchmark scripts to reduce writeback and compaction debt between the last write-only benchmark step (l.i2) and the first read-write benchmark step (qr100). The intention is to reduce variance and make it easier to spot regressions. Alas, that is still an unsolved problem especially on the range query benchmark steps.</p><p>tl;dr - context matters</p><p>The biggest concerns I have are the ~16% slowdown on the initial load (l.i0) benchmark step from MyRocks 5.6.35 to 8.0.32 and the ~5% slowdown for benchmark steps that do point queries (qp*) from MyRocks 8.0.28 to 8.0.32.</p><p></p>Comparing latest MyRocks 8.0.32 relative to latest MyRocks 5.6.35<div><ul style="text-align: left;"><li>Initial load is ~17% slower</li><li>Other write-heavy benchmark steps are ~3% slower</li><li>Range queries are between 6% and 14% faster</li><li>Point queries are ~7% faster</li></ul>Comparing latest MyRocks 8.0.32 to an old build of MyRocks 5.6.35<br /><ul style="text-align: left;"><li>Initial load is ~16% slower</li><li>Other write-heavy benchmarks steps are between 2% and 6% slower</li><li>Range queries are between 5% slower and 5% faster</li><li>Point queries are 5% to 11% faster</li></ul>Comparing latest MyRocks 8.0.32 to latest MyRocks 8.0.28<br /><ul style="text-align: left;"><li>Initial load is ~4% slower</li><li>Other write-heavy benchmark steps are between 3% slower and 2% faster</li><li>Range queries are between 1% slower and 6% faster</li><li>Point queries are ~5% slower</li></ul><p><b>Build + Configuration</b></p><p>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and.html">previous report</a>.</p><p><b>Benchmark</b></p><p>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-myrocks-56-and.html">previous report</a>. </p><div><b>Benchmark steps</b></div><div><br /></div><div>The benchmark is run with 8 clients and a client per table.</div><div><br /></div><div>The benchmark is a sequence of steps that are run in order:</div><div><ul><li>l.i0</li><ul><li>insert 20M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One does inserts as fast as possible and the other does deletes at the same rate as the inserts to avoid changing the number of rows in the table. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow where X is max(1200, 60 + #nrows/1M). While waiting do things to reduce writeback debt where the things are:</li><ul><li>MyRocks (<a href="https://github.com/mdcallag/mytools/blob/dd901e3ef42ae8f0104830b1cac3fd778980508b/bench/ibench/iq.sh#L132">see here</a>) - set rocksdb_force_flush_memtable_now to flush the memtable, wait 20 seconds and then set rocksdb_compact_lzero_now to flush L0. Note that rocksdb_compact_lzero_now wasn't supported until mid-2023.</li></ul></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries as fast as possible and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for 1200 seconds. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>lik qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul></div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for</div><div><ul><li><a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy56/all.html">MyRocks 5.6</a> </li><li><a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy80/all.html">MyRocks 8.0</a></li><li><a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy.all/all.html">MyRocks 5.6 & 8.0</a> with many 5.6 versions</li><li><a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy.latest/all.html">MyRocks 5.6 & 8.0</a> with the latest versions</li></ul></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>Below I use colors to highlight the relative QPS values with <span style="background-color: #f4cccc;">red</span> for <= 0.95, <span style="background-color: #d9ead3;">green</span> for >= 1.05 and <span style="background-color: #eeeeee;">grey</span> for values between 0.95 and 1.05.</div><div><br /></div><div>From <a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy56/all.html#summary">the summary</a> for 5.6</div></div><div><ul style="text-align: left;"><li>The base case is fbmy5635_rel_202104072149</li><li>Comparing throughput in fbmy5635_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #eeeeee;">1.02</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.01</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.93</span>, <span style="background-color: #f4cccc;">0.92</span>, <span style="background-color: #eeeeee;">0.99</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #eeeeee;">0.98</span>, <span style="background-color: #eeeeee;">1.03</span>, <span style="background-color: #eeeeee;">1.01</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy80/all.html#summary">the summary</a> for 8.0</div></div><div><ul style="text-align: left;"><li>The base case is fbmy8028_rel_221222</li><li>The cost of the perf schema is <= 3% for write-heavy, <= 14% for range queries and <= 5% for point queries</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #eeeeee;">0.96</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">1.02</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.98</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.01</span>, <span style="background-color: #d9ead3;">1.06</span>, <span style="background-color: #eeeeee;">0.99</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.95</span>, <span style="background-color: #eeeeee;">0.96</span>, <span style="background-color: #f4cccc;">0.95</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy.all/all.html">the summary</a> for 5.6, 8.0 with many versions</div></div><div><ul style="text-align: left;"><li>The base case is fbmy5635_rel_202104072149</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.84</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.94</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.95</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.98</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.95</span>, <span style="background-color: #d9ead3;">1.05</span>, <span style="background-color: #eeeeee;">1.00</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.05</span>, <span style="background-color: #d9ead3;">1.11</span>, <span style="background-color: #d9ead3;">1.09</span></li></ul></ul></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_11.8u.1tno.cached.fbmy.latest/all.html">the summary</a> for 5.6, 8.0 with latest versions</div></div><div><ul style="text-align: left;"><li>The base case is fbmy5635_rel_221222</li><li>Comparing throughput in fbmy8032_rel_221222 to the base case</li><ul><li>Write-heavy</li><ul><li>l.i0, l.x, l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.83</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span><span style="background-color: white;">, </span><span style="background-color: #eeeeee;">0.97</span></li></ul><li><span style="background-color: white;">Range queries</span></li><ul><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.06</span>, <span style="background-color: #d9ead3;">1.06</span>, <span style="background-color: #d9ead3;">1.14</span></li></ul><li><span style="background-color: white;">Point queries</span></li><ul><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.07</span>, <span style="background-color: #d9ead3;">1.07</span>, <span style="background-color: #d9ead3;">1.07</span></li></ul></ul></ul></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-22186588111423039682024-01-10T09:42:00.000-08:002024-01-10T13:19:26.532-08:00Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database, v2<p>I recently <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">shared results</a> for the updated <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> with Postgres versions 9.0 to 16 using a small server and cached database. Here I have results for a slightly larger but still cached database. The reason for using a larger database is to get some of the benchmark steps to run for more time.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Results here are similar to the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">previous results</a> although a performance problem during the l.i1 and l.i2 benchmark steps is more clear here. In some benchmark steps the planner can spend too much CPU time trying to determine the min and/or max value of a column by reading from the index.</li><li>While Postgres performance is mostly getting better from old to new releases, there have been regressions in a few major releases (PG 11 through 13) for benchmark steps where this is an issue.</li><li>The regressions are likely to be larger for the IO-bound benchmark but that will take a few more days to finish.</li></ul><div><b>The Problem</b></div><div><br /></div><div>I <a href="https://twitter.com/MarkCallaghanDB/status/1744385310061109307">shared details</a> about the problem here and as expected a Postgres expert <a href="https://twitter.com/petervgeoghegan/status/1744387728660160803">quickly replied</a> with advice <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c">pointing me</a> to a few changes that improve the problem.</div><div><br /></div><div>The problem is the pattern of inserts and deletes. Several of the benchmark steps do inserts in ascending PK order (inserts to the head) while doing deletes at the same rate to keep the number of rows fixed. The deletes are done from the other end of the table (deletes to the tail) by removing batches of rows with the smallest value for the PK.<br /><br />The PG planner has code in <a href="https://github.com/postgres/postgres/blob/5b2da240e01ecaef8181b0feebaeb69e6fefdaa0/src/backend/utils/adt/selfuncs.c#L6087">get_actual_variable_range</a> to determine the min or max value of a column when there is a predicate on that column like <i>X < $const</i> or <i>X > $const</i> and $const falls into the largest or smallest histogram bucket. From <a href="https://gist.github.com/mdcallag/f320b859a59770aaf2c12eebe138c946#file-gistfile1-txt-L27">PMP thread stacks</a>, what I see is too much time with that function on the call stack. From <a href="https://gist.github.com/mdcallag/f320b859a59770aaf2c12eebe138c946#file-gistfile1-txt-L8-L9">ps output</a>, the session that does delete statements can use 10X to 100X more CPU than the session that does insert statements. From <i><a href="https://gist.github.com/mdcallag/f320b859a59770aaf2c12eebe138c946#file-gistfile1-txt-L15">explain analyze</a></i> I see that the planner spends ~100 milliseconds per delete statement.</div><div><br /></div><div><b>Build + Configuration</b></div><div><br /></div><div>See the <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">previous report</a> for more details. I used all of the versions described there: 9.0.23, 9.1.24, 9.2.24, 9.3.25, 9.4.26, 9.5.25, 9.6.24, 10.23, 11.22, 12.17, 13.13, 14.10, 15.5, 16.1. And then I also tested 11.19 and 13.10.</div><div><br /></div><div><b>The Benchmark</b></div><div><br /></div><div>The benchmark is <a href="https://smalldatum.blogspot.com/2024/01/updated-insert-benchmark-postgres-9x-to.html">explained here</a> except the first benchmark step, l.i0, loads 30M rows/table here while previously it only loaded 20M. The database still fits in memory as the test server has 16G of RAM and the database tables are ~8G.</div><div><br /></div><div>The test server was named SER4 in the previous report. It has 8 cores, 16G RAM, Ubuntu 22.04 and XFS using 1 m.2 device.</div><div><br />The benchmark steps are:<div><p></p><div><ul style="text-align: left;"><li>l.i0</li><ul><li>insert 30 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li><li>Wait for X seconds after the step finishes to reduce variance during the read-write benchmark steps that follow.</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><b>Results</b><br /><br />The benchmark report <a href="https://mdcallag.github.io/reports/24_01_10.1u.1tno.bee.cached.pg/all.html">is here</a>.</div><div><br />I start with the summary for the <a href="https://mdcallag.github.io/reports/24_01_10.1u.1tno.bee.cached.pg/all.html#summary">current round</a> with 30M rows loaded and the <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.pg/all.html#summary">previous round</a> with 20M rows loaded. Here I focus on the benchmark steps where things are slightly different between the current and previous rounds -- the results for the l.i1 and l.i2 benchmark steps where regressions are more obvious in the current round.</div></div></div></div><div><br /></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.<br /><br />There are big regressions in 11.19, 11.22 and a small one in 13.13 for the l.i1 and l.i2 benchmark steps which is visible in <a href="https://mdcallag.github.io/reports/24_01_10.1u.1tno.bee.cached.pg/all.html#summary">the summary</a>.</div><div><ul style="text-align: left;"><li>For the l.i1 benchmark step the inserts/s rate drops from ~18k/s in 9.6.24 and 10.23 to ~11k/s in 11.19 and 11.22. It also drops by ~14% from 13.10 to 13.13.</li><li>The regressions for the l.i2 benchmark step occur in the same versions but are larger. The issue is that the delete statements in l.i1 delete more rows per statement, so the planner overhead per deleted row is larger for l.i2.<br /></li></ul><div>From the iostat and vmstat metrics collected per benchmark step with both absolute and normalized values (normalized values are absolute value divided by the insert rate) I see that the CPU overhead (cpupq is CPU usecs per insert) per version is inversely correlated with the insert rate.<br /><br />This table shows the value of cpupq (CPU overhead) per version for the l.i1 and l.i2 benchmark steps. All of the numbers for iostat and vmstat are here <a href="https://gist.github.com/mdcallag/0ab64df12ab55a7a324c18c7518b9cce">for l.i1</a> and <a href="https://gist.github.com/mdcallag/f71fe75f09c3450a8c47e3a576d711b9">for l.i2</a>.<br /><br /><google-sheets-html-origin><table border="1" cellpadding="0" cellspacing="0" data-sheets-root="1" dir="ltr" style="border-collapse: collapse; border: none; font-family: Arial; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col></colgroup><tbody><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"version"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">version</td><td data-sheets-value="{"1":2,"2":"l.i1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i1</td><td data-sheets-value="{"1":2,"2":"l.i2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i2</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"10.23"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">10.23</td><td data-sheets-value="{"1":3,"3":1253}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1253</td><td data-sheets-value="{"1":3,"3":5157}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">5157</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"11.19"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">11.19</td><td data-sheets-value="{"1":3,"3":1619}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1619</td><td data-sheets-value="{"1":3,"3":8285}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">8285</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"11.22"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">11.22</td><td data-sheets-value="{"1":3,"3":1623}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1623</td><td data-sheets-value="{"1":3,"3":6611}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">6611</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"12.17"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">12.17</td><td data-sheets-value="{"1":3,"3":1263}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1263</td><td data-sheets-value="{"1":3,"3":5815}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">5815</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"13.10"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">13.10</td><td data-sheets-value="{"1":3,"3":1222}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1222</td><td data-sheets-value="{"1":3,"3":3373}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3373</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"13.13"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">13.13</td><td data-sheets-value="{"1":3,"3":1367}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1367</td><td data-sheets-value="{"1":3,"3":4863}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">4863</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"14.10"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">14.10</td><td data-sheets-value="{"1":3,"3":1126}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1126</td><td data-sheets-value="{"1":3,"3":3449}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3449</td></tr></tbody></table></google-sheets-html-origin></div></div><div><br /></div><div>The table above includes all CPU overhead from everything running on the server (Postgres and the benchmark client). The data below shows the CPU time per session measured by ps near the end of a benchmark step. There is one connection/session that only does delete statements and another that only does insert statements. The output from ps <a href="https://gist.github.com/mdcallag/cf23c0f82fc58771544d0f53a40eb09d">is here</a>. The table below has the CPU seconds per version for both connections -- insert and delete. There are big changes in CPU overhead for the delete connection.</div><div><br /></div><div><google-sheets-html-origin><table border="1" cellpadding="0" cellspacing="0" data-sheets-root="1" dir="ltr" style="border-collapse: collapse; border: none; font-family: Arial; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col></colgroup><tbody><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"version"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">version</td><td data-sheets-value="{"1":2,"2":"insert"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">insert</td><td data-sheets-value="{"1":2,"2":"delete"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">delete</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"10.23"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">10.23</td><td data-sheets-value="{"1":3,"3":587}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">587</td><td data-sheets-value="{"1":3,"3":2587}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2587</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"11.19"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">11.19</td><td data-sheets-value="{"1":3,"3":574}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">574</td><td data-sheets-value="{"1":3,"3":5196}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">5196</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"11.22"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">11.22</td><td data-sheets-value="{"1":3,"3":497}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">497</td><td data-sheets-value="{"1":3,"3":3851}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3851</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"12.17"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">12.17</td><td data-sheets-value="{"1":3,"3":573}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">573</td><td data-sheets-value="{"1":3,"3":3137}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">3137</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"13.10"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">13.10</td><td data-sheets-value="{"1":3,"3":532}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">532</td><td data-sheets-value="{"1":3,"3":1278}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1278</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"13.13"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">13.13</td><td data-sheets-value="{"1":3,"3":548}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">548</td><td data-sheets-value="{"1":3,"3":2403}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">2403</td></tr><tr style="height: 21px;"><td data-sheets-numberformat="{"1":1}" data-sheets-value="{"1":2,"2":"14.10"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">14.10</td><td data-sheets-value="{"1":3,"3":532}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">532</td><td data-sheets-value="{"1":3,"3":1317}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">1317</td></tr></tbody></table></google-sheets-html-origin></div><p></p>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-26089969529187252422024-01-08T14:01:00.000-08:002024-01-08T14:03:16.461-08:00Explaining changes in RocksDB performance for IO-bound workloads<p>I have two recent posts for RocksDB benchmarks (<a href="https://smalldatum.blogspot.com/2023/10/checking-rocksdb-7x-and-8x-for.html">here</a> and <a href="https://smalldatum.blogspot.com/2024/01/rocksdb-8x-benchmarks-large-server-io.html">here</a>) that mention there might be a regression in IO-bound workloads starting in version 8.6 when buffered IO is used. I have one <a href="https://smalldatum.blogspot.com/2023/11/debugging-perf-changes-in-rocksdb-86-on.html">recent post</a> that started to explain the problem. The root cause is changes to code that does readahead for compaction and the problem is worse when the value for the <a href="https://github.com/facebook/rocksdb/blob/a399bbc0370932910454029fe4d49229212ac6cf/include/rocksdb/options.h#L964">compaction_readahead_size option</a> is larger than the value for <a href="https://www.google.com/search?q=linux+min_sectors_kb">max_sectors_kb</a> of the underlying storage device(s). And this is more complex when RAID is used. Some of my test servers use SW RAID 0 and I don't know whether the value for the underlying devices or for the SW RAID device takes precedence.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>With RocksDB 8.6+ you might need to set compaction_read_ahead_size so that it isn't larger than max_sectors_kb. I opened RocksDB <a href="https://github.com/facebook/rocksdb/issues/12038">issue 12038</a> for this.</li></ul><div><b>Benchmarks</b></div><div><br /></div><div>The benchmark is described in a <a href="https://smalldatum.blogspot.com/2024/01/rocksdb-8x-benchmarks-large-server-io.html">previous post</a>. The test server has 40 cores, 80 HW threads, hyperthreads enabled, 256G of RAM and XFS with SW RAID 0 over 6 devices. The value of max_sectors_kb is 128 for the SW RAID device (md2) and 1280 for the underling SSDs.</div><div><br /></div><div>Tests were repeated for RocksDB versions 8.4.4, 8.5.4, 8.6.7, 8.7.3, 8.8.1, 8.9.2.<br /><br />I repeated the IO-bound benchmark using buffered IO in 3 setups:</div><div><ul style="text-align: left;"><li>default - this uses the default for compaction_readahead_size which is 0 prior to RocksDB 8.7 and 2MB starting in RocksDB 8.7. </li><li>crs.1MB - explicitly set compaction_readahead_size=1MB</li><li>crs.512KB - explicitly set compaction_readahead_size=512KB</li></ul></div><div>Code for compaction readahead changed in both RocksDB 8.5 and 8.6. A side-effect of this change is that using compaction_read_ahead_size =0 is bad for performance because it means there will be (almost) no readahead.</div><div><div><br /></div></div><div><b>Results</b></div><div><br /></div><div>Below there are three graphs. The first shows throughput, the second shows the average value for read MB/s per iostat and the third shows the average value for read request size (rareq-sz) per iostat. All of these are measured during the overwrite benchmark step which is write-only and suffers when compaction cannot keep up.<br /><br />The performance summaries from the benchmark scripts <a href="https://gist.github.com/mdcallag/6968f2baa8822dff68ba8acd74b57902">are here</a> and the iostat summary <a href="https://gist.github.com/mdcallag/895cd957cccaa690d367ad1b80f2bd48">is here</a>.</div><div><br /></div><div>Summary</div><div><ul style="text-align: left;"><li>Throughput is lousy in 8.6.7 because the benchmark client (db_bench) hardwired the value for compaction_readahead_size to 0 rather than use the default of 2MB.</li><li>Throughput is best with compaction_readahead_size =1MB and worst with it =512KB</li><li>The IO rate (read MB/s) is best with compaction_readahead_size =2MB, but that doesn't translate to better throughput for the application.</li><li>The average read size from storage (rareq-sz) is best with compaction_readahead_size =1MB and worst with it =2MB</li><li>Note that better or worse here depends on context and a big part of the context is the value of max_sectors_kb. So changing the default for compaction_readahead_size from 2MB to 1MB might be good in some cases but probably not all cases.</li></ul></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxq1CVWvsCK7QTf29lV3hYpuQrviOBxEgjLBIZPJ_pqCxEpA0_jLnKbkPXnKNK4I9pa4lHlG8AdE_O0rimnayur2hZUwWIlonGKZC0SvBb1Zmw9PK0-91ddTe735pzNIKe57X5p485BRC_vJymy_BWM78OOoIYluV4g_Mfyc0wluOkm7uUUJefZ25tYWF0/s600/Throughput%20for%20overwrite%20by%20compaction_readahead_size.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxq1CVWvsCK7QTf29lV3hYpuQrviOBxEgjLBIZPJ_pqCxEpA0_jLnKbkPXnKNK4I9pa4lHlG8AdE_O0rimnayur2hZUwWIlonGKZC0SvBb1Zmw9PK0-91ddTe735pzNIKe57X5p485BRC_vJymy_BWM78OOoIYluV4g_Mfyc0wluOkm7uUUJefZ25tYWF0/w640-h396/Throughput%20for%20overwrite%20by%20compaction_readahead_size.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2fsUW6QYtjOG5ygmBb0rCAQJb1DoiAWsnO2qcZkQUYSJV2lvRuM2bdsWo4oj3o-39759vuqAv3ZDlWL9t9F3gLvOJzhyDhwcgYf_uWHYonyOO-EauzPlAtxui_wM0lMoym0E_3z7vMbX8LV7fZIu9suAryoe0bloIszec5CVmmGQTQDYOO86Indt3qgwn/s600/iostat%20read%20MB_s%20by%20compaction_readahead_size.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg2fsUW6QYtjOG5ygmBb0rCAQJb1DoiAWsnO2qcZkQUYSJV2lvRuM2bdsWo4oj3o-39759vuqAv3ZDlWL9t9F3gLvOJzhyDhwcgYf_uWHYonyOO-EauzPlAtxui_wM0lMoym0E_3z7vMbX8LV7fZIu9suAryoe0bloIszec5CVmmGQTQDYOO86Indt3qgwn/w640-h396/iostat%20read%20MB_s%20by%20compaction_readahead_size.png" width="640" /></a></div><div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX0ZuFx1x-ckkheAGn0K5ahchoM4RVB16JfwjvmRO1rdYQa9PxAHSK9kBvndDcye7s8lYXRpDgi9BGSfvdpDSTyOrjY47qkyJpYdBMtzuEEjikPXSSVKvevWwTrrI9MaG6n1T2_nVe61Mrrq_utX2TGobaPKA6hp487TL4gfiYWmA7zxqMyXMaeeTbVlsG/s600/iostat%20average%20read%20size%20(rareq-sz)%20by%20compaction_readahead_size.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiX0ZuFx1x-ckkheAGn0K5ahchoM4RVB16JfwjvmRO1rdYQa9PxAHSK9kBvndDcye7s8lYXRpDgi9BGSfvdpDSTyOrjY47qkyJpYdBMtzuEEjikPXSSVKvevWwTrrI9MaG6n1T2_nVe61Mrrq_utX2TGobaPKA6hp487TL4gfiYWmA7zxqMyXMaeeTbVlsG/w640-h396/iostat%20average%20read%20size%20(rareq-sz)%20by%20compaction_readahead_size.png" width="640" /></a></div><br /><div><br /></div><div><br /></div><p></p></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-43437026223259515002024-01-04T12:44:00.000-08:002024-01-08T13:31:46.556-08:00RocksDB 8.x benchmarks: large server, IO-bound<p>This post has results for performance tests in all versions of 8.x from 8.0.0 to 8.9.2 using a large server and IO-bound workload. In <a href="https://smalldatum.blogspot.com/2023/12/rocksdb-8x-benchmarks-large-server.html">a previous post</a> I shared results for the same hardware with a cached database.</p><p>tl;dr</p><p></p><ul><li>There is a small regression that arrives in RocksDB 8.6 for overwriteandwait (write-only, random writes). But only for buffered IO. I think this is caused by changes to compaction readahead. For now I will reuse RocksDB <a href="https://github.com/facebook/rocksdb/issues/12038">issue 12038</a> for this.</li></ul><div>I focus on the benchmark steps that aren't read-only because they suffer less from noise. These benchmark steps are fillseq, revrangewhilewriting, fwdrangewhilewriting, readwhilewriting and overwriteandwait. I also focus on leveled more so than universal, in part because there is more noise with universal but also because the workloads I care about most use leveled.</div><div><p><b>Builds</b></p><div>I compiled with gcc RocksDB 8.0.0, 8.1.1, 8.2.1, 8.3.3, 8.4.4, 8.5.4, 8.6.7, 8.7.3 and 8.8.1 and 8.9.2 which are the latest patch releases.</div></div><div><br /></div><div><div><b>Benchmark</b></div><div><br />All tests used a server with 40 cores, 80 HW threads, 2 sockets, 256GB of RAM and many TB of fast NVMe SSD with Linux 5.1.2, XFS and SW RAID 0 across 6 devices. For the results here, the database is cached by RocksDB. The benchmark was repeated for leveled and universal compaction using both buffered IO and O_DIRECT.</div><div><br /></div><div>Everything used the LRU block cache and the default value for compaction_readahead_size. Soon I will switch to using the hyper clock cache once RocksDB 9.0 arrives.<br /><br />I used <a href="https://github.com/mdcallag/mytools/tree/master/bench/rx2">my fork of the RocksDB benchmark scripts</a> that are wrappers to run db_bench. These run db_bench tests in a special sequence -- load in key order, read-only, do some overwrites, read-write and then write-only. The benchmark was run using 24 threads. <span style="background-color: white; color: #222222;">How I do benchmarks for RocksDB is explained </span><a href="https://smalldatum.blogspot.com/2022/08/how-i-do-performance-tests-for-rocksdb.html" style="background-color: white;">here</a><span style="background-color: white; color: #222222;"> and </span><a href="https://smalldatum.blogspot.com/2022/08/how-i-do-rocksdb-performance-tests-part.html" style="background-color: white;">here</a><span style="background-color: white; color: #222222;">. The command line to run them is: </span></div><div><span style="background-color: white; color: #222222;"><blockquote><span style="font-family: courier; font-size: x-small;">bash x3.sh 24 no 3600 c40r256bc180 40000000 4000000000 iobuf iodir</span></blockquote></span></div><div>A spreadsheet with all results <a href="https://docs.google.com/spreadsheets/d/11WwnwBmB6HvbnW_H1rEpMMFk1s2zu4p_uxtbu8Mbv9c/edit?usp=sharing">is here</a> and performance summaries are here:</div></div><div><ul style="text-align: left;"><li>buffered IO - <a href="https://gist.github.com/mdcallag/39adbf9dae4665ec7c1ca1ecd27e3e6b">for leveled</a> and <a href="https://gist.github.com/mdcallag/85dfd2162ef2a8d37c3196c87d315643">for universal</a></li><li>O_DIRECT - <a href="https://gist.github.com/mdcallag/4c5a3884383e36165a7fd5bb4a4b5237">for leveled</a> and <a href="https://gist.github.com/mdcallag/3578ab18bab559b2cf96f8bcd188090d">for universal</a></li></ul><div><b>Results: leveled</b></div></div><div><br /></div><div>There is one fake regression in overwriteandwait for RocksDB 8.6.7. The issue is that the db_bench benchmark client ignored a new default value for compaction_readahead_size. That has been fixed in 8.7.</div><div><br /></div><div>The is one real regression in overwriteandwait that probably arrived in 8.6 and is definitely in 8.7 through 8.9. The throughput for overwriteandwait drops about 5% from 8.5 to 8.7+. I assume this is from changes to compaction readahead that arrived in 8.6. These changes are for readahead done when buffered IO is used, but not when O_DIRECT is used and in the charts below the regression does not repeat with O_DIRECT.</div><div><br /></div><div>From the performance summary for overwriteandwait with buffered IO (<a href="https://gist.github.com/mdcallag/39adbf9dae4665ec7c1ca1ecd27e3e6b#file-summary-tsv-L127-L130">see here</a>)<br /><ul style="text-align: left;"><li>compaction wall clock time (c_wsecs) increases by ~3% from ~18200 in 8.5 to ~18700 in 8.7+</li><li>compaction CPU seconds (c_csecs) decreases by ~5% from ~18000 in 8.5 to ~17200 in 8.7+</li><li>the c_csecs / c_wsecs ratio is ~0.99 for 8.0 thru 8.5 and drops to ~0.92 in 8.7+, so one side effect of the change in 8.6 is that compaction threads see more IO latency</li><li>this issue doesn't repeat with O_DIRECT, <a href="https://gist.github.com/mdcallag/4c5a3884383e36165a7fd5bb4a4b5237#file-summary-tsv-L127-L130">see here</a></li></ul><div>From iostat metrics during overwriteandwait with buffered IO</div><div><ul style="text-align: left;"><li>rawait (r_await) drops from 0.21 in 8.5 to ~0.08 in 8.7+</li><li>rareq-sz (rareqsz) drops from 28.3 in 8.5 to ~9 in 8.7+</li><li>the increase in rawait was expected given the decrease in rareq-sz, the real problem is the drop in rareq-sz as the only reads during overwriteandwait are from compaction </li><li>this issue doesn't repeat with O_DIRECT</li></ul></div><div><div><span style="font-family: courier; font-size: xx-small;">leveled, buffered IO</span></div><div><span style="font-family: courier; font-size: xx-small;">c rps rmbps rrqmps rawait rareqsz wps wmbps wrqmps wawait wareqsz ver</span></div><div><span style="font-family: courier; font-size: xx-small;">3762 4762 70.0 0.00 0.21 28.3 5648 576.2 0.00 0.06 104.3 8.5.4</span></div><div><span style="font-family: courier; font-size: xx-small;">3879 21308 90.9 0.00 0.05 4.2 4393 447.7 0.00 0.06 104.8 8.6.7</span></div><div><span style="font-family: courier; font-size: xx-small;">3790 9029 79.9 0.00 0.07 9.0 5229 535.8 0.00 0.06 105.2 8.7.3</span></div><div><span style="font-family: courier; font-size: xx-small;">3790 9678 74.5 0.00 0.08 8.3 5283 539.6 0.00 0.06 104.8 8.8.1</span></div><div><span style="font-family: courier; font-size: xx-small;">3790 9808 75.5 0.00 0.08 8.5 5298 540.1 0.00 0.06 104.7 8.9.2</span></div></div><div><span style="font-family: courier; font-size: xx-small;"><br /></span></div><div><span style="font-family: courier; font-size: xx-small;">leveled, O_DIRECT</span></div><div><span style="font-family: courier; font-size: xx-small;"><div>c rps rmbps rrqmps rawait rareqsz wps wmbps wrqmps wawait wareqsz ver</div><div>3765 5236 619.4 0.00 0.32 120.5 5779 687.7 0.00 0.07 121.1 8.5.4</div><div>4187 37528 340.4 0.00 0.09 9.2 1908 218.5 0.00 0.06 118.0 8.6.7</div><div>3754 5170 612.6 0.00 0.33 121.1 5708 679.1 0.00 0.07 121.4 8.7.3</div><div>3759 5084 602.8 0.00 0.35 121.1 5612 668.0 0.00 0.08 121.5 8.8.1</div><div>3759 5048 598.1 0.00 0.37 121.1 5574 663.3 0.00 0.08 121.4 8.9.2</div></span></div></div><div><br />These charts show relative QPS which is (QPS for my version / QPS for RocksDB 8.0).</div><div><br /></div><div>First is with buffered IO (no O_DIRECT)</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLU2GBJo96DlTVErHIf3NFrANfhUupRZPDT2bDPWkVX059dIGpdgEeatNpLxivBOj8sNJZq3x3LOBRj-ZkbzUOIxKlzYKAra6y6_QgS-eXEUxgTnPafHDQnXxt4s7SyXmS19l2LO-9jqIfUvLXRUzq8FaEZEhL3VZdmHuP0ab0QWQwpEYlRE-P7nvk6Cie/s600/QPS%20relative%20to%20RocksDB%208.0.0_%20buffered%20IO,%20leveled.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLU2GBJo96DlTVErHIf3NFrANfhUupRZPDT2bDPWkVX059dIGpdgEeatNpLxivBOj8sNJZq3x3LOBRj-ZkbzUOIxKlzYKAra6y6_QgS-eXEUxgTnPafHDQnXxt4s7SyXmS19l2LO-9jqIfUvLXRUzq8FaEZEhL3VZdmHuP0ab0QWQwpEYlRE-P7nvk6Cie/w640-h396/QPS%20relative%20to%20RocksDB%208.0.0_%20buffered%20IO,%20leveled.png" width="640" /></a></div><div>Next is with O_DIRECT (no OS page cache)</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvmm13jzvFHf04bc-XbC7tYtwOV03xo81s1I-pU0GnQ3s9NdYv-_MZKHz36Ynjbpu_Vh1dgiEtDZCLcKB1T6HYLU23f8ZfSIXGNlVfdVxKGchRQOTdWRAEd8hUCLwtvXHphJh9zuOtvWgMvrA4wrkx9Q9QXpe5qOduxZszR-B3rFVSSJ-hl5NAIEZlpioU/s600/QPS%20relative%20to%20RocksDB%208.0.0_%20O_DIRECT,%20leveled.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvmm13jzvFHf04bc-XbC7tYtwOV03xo81s1I-pU0GnQ3s9NdYv-_MZKHz36Ynjbpu_Vh1dgiEtDZCLcKB1T6HYLU23f8ZfSIXGNlVfdVxKGchRQOTdWRAEd8hUCLwtvXHphJh9zuOtvWgMvrA4wrkx9Q9QXpe5qOduxZszR-B3rFVSSJ-hl5NAIEZlpioU/w640-h396/QPS%20relative%20to%20RocksDB%208.0.0_%20O_DIRECT,%20leveled.png" width="640" /></a></div><div><b>Results: universal</b></div><div><b><br /></b></div><div>Summary</div><div><ul style="text-align: left;"><li>Just like above for leveled, there is a bogus regression for overwriteandwait with RocksDB 8.6</li><li>Results here have more variance than the results for leveled above. While I have yet to prove this, universal compaction benchmarks are likely prone to more variance. So I don't think there are regressions here.</li></ul></div><div>These charts show relative QPS which is (QPS for my version / QPS for RocksDB 8.0).</div><div><br /></div><div>First is with buffered IO (no O_DIRECT)</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgR_756Gl7muxrhmCZIfqbgxLJ5BhsedlFfqD2_GtlTTq6IRBuREt7mK4ffPVPMUXSxuAmjt74vnShQ8aQfR-ZEV60QTuiq_P_mpz5lJhvUiwyYzfVbziVyun-Yy1tehCISgrgMod4lerLIeeH90QYhsWtYICGjgwiKoQ2YtH5tvS_doWwTBb-KpEhD9NJd/s600/QPS%20relative%20to%20RocksDB%208.0.0_%20buffered%20IO,%20universal.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgR_756Gl7muxrhmCZIfqbgxLJ5BhsedlFfqD2_GtlTTq6IRBuREt7mK4ffPVPMUXSxuAmjt74vnShQ8aQfR-ZEV60QTuiq_P_mpz5lJhvUiwyYzfVbziVyun-Yy1tehCISgrgMod4lerLIeeH90QYhsWtYICGjgwiKoQ2YtH5tvS_doWwTBb-KpEhD9NJd/w640-h396/QPS%20relative%20to%20RocksDB%208.0.0_%20buffered%20IO,%20universal.png" width="640" /></a></div><div>Next is with O_DIRECT (no OS page cache)</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhf6j0J-C0EF-rEgHx96gn-j-ZMoHJ6oH_87jJmDcRlg2zEPho-KssVOcREgHWcPyKcBZ2mW0ilD0DZbw0DIui4sdwhyphenhyphenJpgzKt2jM6DuIOpJKFvbBTXlhBSI85GComMJ9LPVe_rqNBiPHxOlx9mVMxug32GyAbwV46pER-XCsVDpr2FJu37nwYJLU-olqRO/s600/QPS%20relative%20to%20RocksDB%208.0.0_%20O_DIRECT,%20universal.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhf6j0J-C0EF-rEgHx96gn-j-ZMoHJ6oH_87jJmDcRlg2zEPho-KssVOcREgHWcPyKcBZ2mW0ilD0DZbw0DIui4sdwhyphenhyphenJpgzKt2jM6DuIOpJKFvbBTXlhBSI85GComMJ9LPVe_rqNBiPHxOlx9mVMxug32GyAbwV46pER-XCsVDpr2FJu37nwYJLU-olqRO/w640-h396/QPS%20relative%20to%20RocksDB%208.0.0_%20O_DIRECT,%20universal.png" width="640" /></a></div><div><br /></div><div><br /></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-26716427175201796862024-01-03T17:29:00.000-08:002024-01-03T17:46:38.995-08:00innodb_log_writer_threads and the Insert Benchmark<p>I am wary of innodb_log_writer_threads=ON. It is <a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_log_writer_threads">on by default</a> and has been a problem for me in this past. It would be great to learn from people for whom it is useful. This is a follow up to a previous post where I mentioned that things looked bad with innodb_log_writer_threads. I opened <a href="https://bugs.mysql.com/bug.php?id=113485">bug 113485</a> to suggest that one of the following should be done: make the default =OFF or at least let innodb_dedicated_server disable it on small servers.</p><p>The <a href="https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-logging.html">MySQL docs</a> suggest only using =ON for high-concurrency workloads, alas it is =ON by default.<br /><span style="background-color: white; color: #555555; font-size: 14.256px;"></span></p><blockquote><span style="font-family: inherit;">Dedicated log writer threads can improve performance on high-concurrency systems, but for low-concurrency systems, disabling dedicated log writer threads provides better performance.</span></blockquote><p>tl;dr, v1</p><p></p><ul style="text-align: left;"><li>innodb_log_writer_threads seems to make things worse most of the time</li><li>the workaround is innodb_log_writer_threads=OFF</li><li>sadly, it is =ON by default</li></ul><p></p><p>tl;dr, v2</p><p></p><ul style="text-align: left;"><li>Sometimes innodb_log_writer_threads helps, more often it doesn't in my tests</li><li>innodb_log_writer_threads increases the frequency of fsyncs per commit by a large amount -- between 3X and 200X depending on the setup. The impact from this is less obvious on the 40-core server that has a fast fsync. The impact is really bad on a server that doesn't have a fast fsync.</li><li>innodb_log_writer_threads shouldn't be used on small servers with <= X CPU cores. For the servers I tested X=8 but I suspect it is even larger.</li><li>It can be lousy for performance when fsync latency is high (several millsecs)</li><li>I filed a <a href="https://bugs.mysql.com/bug.php?id=113485">feature request</a> to change the default for innodb_log_writer_threads to =OFF and/or detect when the number of CPU cores is not large and disable it by default.</li></ul><p></p><p style="font-weight: bold;"><b>The bugs</b></p>The redo log code was changed in a big way in MySQL 8.0 and my experience with that has not been great. It was nice to get the ability to disable the new features, but that (<a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_log_writer_threads">innodb_log_writer_threads</a>) didn't arrive until 8.0.22.<br /><ul style="text-align: left;"><li>I reported <a href="https://bugs.mysql.com/bug.php?id=90670">bug 90670</a> for MySQL 8.0.11. This is a crashing bug that was fixed in 8.0.13. It was found via sysbench. I assume I could have found it with the Insert Benchmark.</li><li>I reported <a href="https://bugs.mysql.com/bug.php?id=90890">bug 90890</a> for MySQL 8.0.11. This is a perf bug that was fixed in 8.0.14. It was found via the Insert Benchmark. The perf bug is that the CPU overhead/operation had doubled vs 5.7 releases.</li><li>I reported <a href="https://bugs.mysql.com/bug.php?id=90993">bug 90993</a> for MySQL 8.0.11. This is a crashing bug that was fixed in 8.0.13. It was found via the Insert Benchmark.</li><li>I reported <a href="https://bugs.mysql.com/bug.php?id=102238">bug 102238</a> for MySQL 8.0.22. This is a perf bug that is still open and the workaround is to use innodb_log_writer_threads=OFF. </li></ul><div><b>The tuning options</b></div><div><br />When innodb_log_writer_threads=ON there will be more spinning, which not only means more CPU can be burned, but also that there are 3 new config options for tuning how the spin wait loops happen. By my count that is 3 options too many. I did not try to tune these. From <a href="https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-logging.html">the docs</a> on this feature the config options are:<br /><ul style="text-align: left;"><li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_log_wait_for_flush_spin_hwm">innodb_log_wait_for_flush_spin_hwm</a></li><li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_log_spin_cpu_abs_lwm">innodb_log_spin_cpu_abs_lwm</a></li><li><a href="https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_log_spin_cpu_pct_hwm">innodb_log_spin_cpu_pct_hwm</a></li></ul></div><div><b>Benchmark</b></div><div><br />I share results from 3 servers</div><div><ul style="text-align: left;"><li>8-core</li><ul><li>8 CPU cores, 16G RAM, XFS, 1 m.2 device, Ubuntu 22.04</li><li>benchmark uses 1 client</li></ul><li>32-core</li><ul><li>32 cores, hyperthreads off, 128G RAM, XFS with SW RAID 0 over 2 m.2 devices, Ubuntu 22.04</li><li>fsync latency is not great on this host, maybe ~5 millisecs</li><li>benchmark uses 12 clients</li></ul><li>40-core</li><ul><li>40 cores, 80 HW threads, hyperthreads on, 256G RAM, XFS with SW RAID 0 over 4 SSDs</li><li>fsync latency is much better than on the 32-core host, maybe <= 200 microsecs</li><li>benchmark uses 16, 24, 32 and 40 clients</li></ul></ul></div><div>I used the <a href="https://smalldatum.blogspot.com/2023/05/updates-to-insert-benchmark.html">Insert Benchmark</a> with a cached database. With X clients there were X tables and a client per table. I focus on the first three benchmarks steps that are write-heavy. The spreadsheet with all results <a href="https://docs.google.com/spreadsheets/d/1eg6VMQ7uZLmQAJdAPGZkkXbEoJVioWgg0dkcbTa6PmM/edit?usp=sharing">is here</a>. The benchmark steps are:</div><div><ul style="text-align: left;"><li>l.i0</li><ul><li>does the initial load in PK order without secondary indexes and 1 connection/client. This inserts 20M rows/table.</li><li>each commit inserts 100 rows for big transactions or 10 rows for small transactions. Inserts are in key order so this only makes a few pages dirty. And there are no secondary indexes.</li></ul><li>l.x</li><ul><li>creates 3 secondary indexes per table. There is 1 connection/client.</li></ul><li>l.i1</li><ul><li>does random inserts matched by random deletes. There are 2 connections/client -- one for inserts, one for deletes. This step is the most likely to make the CPU oversubscribed.</li><li>each commit inserts 50 rows for big transactions or 50 rows for small transactions. For each row there are also 3 secondary indexes to maintain which increases the amount of redo per commit. Inserts are in PK order but not in order for any of the secondary indexes so these make more pages dirty compared to l.i0.</li></ul></ul></div><div>The benchmark was repeated in 2 configurations -- for innodb_log_writer_threads =ON and =OFF. The my.cnf files are here. There are 2 files per server -- one with innodb_log_writer_threads =ON and one with it =OFF. Both have sync_binlog=1 and innodb_flush_logs_at_trx_commit=1. The my.cnf files <a href="https://github.com/mdcallag/mytools/tree/master/bench/conf/arc/jan24.lwt">are here</a>. I did not tune the 3 innodb_log_writer_threads options.</div><div><br /></div><div>The benchmark was repeated for two workload types -- big and small transactions. For big transactions I used the Insert Benchmark as-is so that the rows/commit is 100 for l.i0 and 50 for l.i1. For small transactions I reduced that to 10 for l.i0 and 5 for l.i1. </div><div><br /></div><div><div><b>Results</b></div></div><div><b><br /></b></div><div><div>Throughput in the charts below measures the following</div><div><ul><li>l.i0 - inserts/second</li><li>l.x - indexed rows/second</li><li>l.i1 - inserts/second</li></ul></div></div><div>These charts show the throughput for MySQL with innodb_log_writer_threads =OFF relative =ON. A value greater than 1 means that MySQL is faster with =OFF. Below I use <i>LWT</i> in place of <i>innodb_log_writer_threads</i>.</div><div><br /></div><div>For the 40-core server</div><div><ul style="text-align: left;"><li>l.i0 - throughput is always (slightly) better with LWT=OFF</li><li>l.x - throughput is always (slightly) better with LWT=ON</li><li>l.i1 - results are mixed.</li><ul><li>The best case for LWT=ON is with 40 clients and big transactions. TODO was CPU saturated? Note that LWT=ON does better relative to LWT=OFF as the concurrency increases.</li><li>The best cases for LWT=OFF are with lower concurrency levels.</li></ul></ul></div><div class="separator" style="clear: both; text-align: left;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikLWjLceinnvRT5yw_wRNJJqVwAEA4lXlqOa1akeVBYBnFyjDJopR6XPynYWjYOYJ-oYVrOAb1okHgbU8euJJITDnFRwdcOXAvdBzH7ohLiboPrAHx6amwWGC2z2P9n5AlfKhCJhEETB2vjg9s3rX-2F5IPDzcjUR3BeYZG6xE1BKjFIZjy2exKUpAGVXb/s600/log_writer_threads=OFF%20_%20=ON,%2040-core%20server.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikLWjLceinnvRT5yw_wRNJJqVwAEA4lXlqOa1akeVBYBnFyjDJopR6XPynYWjYOYJ-oYVrOAb1okHgbU8euJJITDnFRwdcOXAvdBzH7ohLiboPrAHx6amwWGC2z2P9n5AlfKhCJhEETB2vjg9s3rX-2F5IPDzcjUR3BeYZG6xE1BKjFIZjy2exKUpAGVXb/w640-h396/log_writer_threads=OFF%20_%20=ON,%2040-core%20server.png" width="640" /></a>For the 32-core server</div><div class="separator" style="clear: both; text-align: left;"><ul style="text-align: left;"><li>I had to use log scale because the differences were huge for l.i1</li><li>Fsync latency on this host might be ~5 milliseconds which is large</li><li>LWT=OFF is up to ~5X faster for l.i0 and up to ~100X for l.i1 relative to LWT=ON</li><li>I try to explain the performance differences in the sections that follow</li></ul></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8blvwYsTYtBZpdrZqIkaKwhZuNeFLz7h6SGHshDPhEZ9xgfJSxiySEqkHEwLg2LAhfqOPooTFL4l1-i2oarwXEGLAz8OmOmwPYPhdW7PmadeV-5hfsUWseal4oTBdmFPbeK7QQx4JIz33aL6KgyiphhKk914Zj9ki-ssiEIOBF28XwZ6W5F-ebS29YFPH/s600/log_writer_threads=OFF%20_%20=ON,%2032-core%20server.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8blvwYsTYtBZpdrZqIkaKwhZuNeFLz7h6SGHshDPhEZ9xgfJSxiySEqkHEwLg2LAhfqOPooTFL4l1-i2oarwXEGLAz8OmOmwPYPhdW7PmadeV-5hfsUWseal4oTBdmFPbeK7QQx4JIz33aL6KgyiphhKk914Zj9ki-ssiEIOBF28XwZ6W5F-ebS29YFPH/w640-h396/log_writer_threads=OFF%20_%20=ON,%2032-core%20server.png" width="640" /></a></div><div>For the 8-core server</div><div><ul style="text-align: left;"><li>LWT=OFF is always faster than =ON, up to 3X faster</li></ul></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqviJc9IHHk6jQ23fOHICWWiqOxQ1w5uW9IUs_-NNKkgskgyM0szWvHnePzZcjN_vpzhZNxQ18eVfGHneMfFkcJL0PmPOj1mLSe8mcMoIBM53Xet4AoM6GlKANLcgQPGYBSGJCles6AtE-vNc-8JpQ8Pnkk4RHnQp12XMQ8iF82GA0tBdNtD2luYqqG6ZH/s600/log_writer_threads=OFF%20_%20=ON,%208-core%20server.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="371" data-original-width="600" height="396" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiqviJc9IHHk6jQ23fOHICWWiqOxQ1w5uW9IUs_-NNKkgskgyM0szWvHnePzZcjN_vpzhZNxQ18eVfGHneMfFkcJL0PmPOj1mLSe8mcMoIBM53Xet4AoM6GlKANLcgQPGYBSGJCles6AtE-vNc-8JpQ8Pnkk4RHnQp12XMQ8iF82GA0tBdNtD2luYqqG6ZH/w640-h396/log_writer_threads=OFF%20_%20=ON,%208-core%20server.png" width="640" /></a></div><div style="font-weight: bold;"><b>The big problem</b></div><div style="font-weight: bold;"><b><br /></b></div><div>The big problem is that with innodb_log_writer_threads =ON the number of fsyncs per commit is between 3X and 200X larger vs =OFF. The extra details about iostat, vmstat and the fsync frequency (via the OS fsyncs counters) are here <a href="https://gist.github.com/mdcallag/9df131055d380dfdc98b88f5474ba782">for l.i0</a> and <a href="https://gist.github.com/mdcallag/aedac8b00b12d910754aa83490ebade7">for l.i1</a>.</div><div><br /></div><div>My helper scripts archive the output from SHOW ENGINE INNODB STATUS at the end of each benchmark step and from that I grep the line with <i>OS fsyncs.</i> The l.i0 and l.i1 benchmark steps do the same number of inserts for LWT =ON and =OFF so I just compute the ratio of (fsyncs with =ON) / (fsyncs with =OFF) and the results are much worse then I expected. I didn't try to change the 3 options related to the LWT feature, other than innodb_log_writer_threads=OFF.<br /><br />The table below lists the fsync ratio which is:</div><div><blockquote>(fsyncs with innodb_log_writer_threads =ON) / (fsyncs with it =OFF)</blockquote></div><div><br /></div><div><google-sheets-html-origin><table border="1" cellpadding="0" cellspacing="0" data-sheets-root="1" dir="ltr" style="border-collapse: collapse; border: none; font-family: Arial; font-size: 10pt; table-layout: fixed; width: 0px;" xmlns="http://www.w3.org/1999/xhtml"><colgroup><col width="100"></col><col width="100"></col><col width="100"></col><col width="100"></col><col width="119"></col></colgroup><tbody><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"Server"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">Server</td><td data-sheets-value="{"1":2,"2":"# clients"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"># clients</td><td data-sheets-value="{"1":2,"2":"transaction size"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">transaction size</td><td data-sheets-value="{"1":2,"2":"step"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">step</td><td data-sheets-value="{"1":2,"2":"fsync ratio"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">fsync ratio</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":24}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":2,"2":"small"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">small</td><td data-sheets-value="{"1":2,"2":"l.i0"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i0</td><td data-sheets-value="{"1":2,"2":"~3.5"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~3.5</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":24}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":2,"2":"big"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">big</td><td data-sheets-value="{"1":2,"2":"l.i0"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i0</td><td data-sheets-value="{"1":2,"2":"~3.2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~3.2</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":40}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">40</td><td data-sheets-value="{"1":2,"2":"small"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">small</td><td data-sheets-value="{"1":2,"2":"l.i0"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i0</td><td data-sheets-value="{"1":2,"2":"~4.2"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~4.2</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":40}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">40</td><td data-sheets-value="{"1":2,"2":"big"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">big</td><td data-sheets-value="{"1":2,"2":"l.i0"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i0</td><td data-sheets-value="{"1":2,"2":"~5"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~5</td></tr><tr style="height: 21px;"><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td><td style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;"></td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"24-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">24-core</td><td data-sheets-value="{"1":3,"3":12}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td><td data-sheets-value="{"1":2,"2":"small"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">small</td><td data-sheets-value="{"1":2,"2":"l.i1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i1</td><td data-sheets-value="{"1":2,"2":"~18"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~18</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"24-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">24-core</td><td data-sheets-value="{"1":3,"3":12}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">12</td><td data-sheets-value="{"1":2,"2":"big"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">big</td><td data-sheets-value="{"1":2,"2":"l.i1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i1</td><td data-sheets-value="{"1":2,"2":"~200"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~200</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":24}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":2,"2":"small"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">small</td><td data-sheets-value="{"1":2,"2":"l.i1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i1</td><td data-sheets-value="{"1":2,"2":"~4.6"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~4.6</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":24}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">24</td><td data-sheets-value="{"1":2,"2":"big"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">big</td><td data-sheets-value="{"1":2,"2":"l.i1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i1</td><td data-sheets-value="{"1":2,"2":"~7.5"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~7.5</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":40}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">40</td><td data-sheets-value="{"1":2,"2":"small"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">small</td><td data-sheets-value="{"1":2,"2":"l.i1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i1</td><td data-sheets-value="{"1":2,"2":"~5.8"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~5.8</td></tr><tr style="height: 21px;"><td data-sheets-value="{"1":2,"2":"40-core"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">40-core</td><td data-sheets-value="{"1":3,"3":40}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; text-align: right; vertical-align: bottom;">40</td><td data-sheets-value="{"1":2,"2":"big"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">big</td><td data-sheets-value="{"1":2,"2":"l.i1"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">l.i1</td><td data-sheets-value="{"1":2,"2":"~12.3"}" style="border: 1px solid rgb(204, 204, 204); overflow: hidden; padding: 2px 3px; vertical-align: bottom;">~12.3</td></tr></tbody></table></google-sheets-html-origin></div><div><br /></div><div style="font-weight: bold;"><b><br /></b></div><b>Explaining: 40-core server</b><div><br /></div><div>Details from iostat and vmstat are here <a href="https://gist.github.com/mdcallag/9df131055d380dfdc98b88f5474ba782#file-i0-txt-L69">for l.i0</a> and <a href="https://gist.github.com/mdcallag/aedac8b00b12d910754aa83490ebade7#file-i1-txt-L88">for l.i1</a></div><div><ul><li>context switches/operation (cs/q) are larger with LWT=ON</li><li>CPU/operation (cpu/q) is larger with LWT=ON</li></ul></div><div><b>Explaining: 32-core server</b></div><div><br /></div><div>Details from iostat and vmstat are here <a href="https://gist.github.com/mdcallag/9df131055d380dfdc98b88f5474ba782#file-i0-txt-L29">for l.i0</a> and <a href="https://gist.github.com/mdcallag/aedac8b00b12d910754aa83490ebade7#file-i1-txt-L30">for l.i1</a></div><div><ul style="text-align: left;"><li>context switches/operation (cs/q) are much larger with LWT=ON</li><li>CPU/operation (cpu/q) is much larger with LWT=ON</li></ul></div><div><b>Explaining: 8-core server</b></div><div><br /></div><div>Details from iostat and vmstat are here <a href="https://gist.github.com/mdcallag/9df131055d380dfdc98b88f5474ba782#file-i0-txt-L2">for l.i0</a> and <a href="https://gist.github.com/mdcallag/aedac8b00b12d910754aa83490ebade7#file-i1-txt-L2">for l.i1</a></div><div><ul style="text-align: left;"><li>context switches/operation (cs/q) are much larger with LWT=ON</li><li>CPU/operation (cpu/q) is much larger with LWT=ON</li></ul><div class="separator" style="clear: both; text-align: left;"><b><br /></b></div><div class="separator" style="clear: both; text-align: left;"><br /></div><p></p></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-26728051073142874512024-01-02T14:47:00.000-08:002024-01-12T09:27:24.893-08:00Updated Insert benchmark: MyRocks 5.6 and 8.0, small server, cached database<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> using MyRocks 5.6 and 8.0 using a small server and cached workload. A recent writeup from the same benchmark using a medium server <a href="https://smalldatum.blogspot.com/2023/08/checking-myrocks-56-for-regressions_10.html">is here</a>.</p>For old MyRocks 5.6.35 vs latest 5.6.35<br /><ul><li>There might be large regressions for the range query tests (qr*). These might also be noise. I have more work in progress to figure that out. I don't see such a large regression <a href="https://smalldatum.blogspot.com/2023/08/checking-myrocks-56-for-regressions_10.html">on a medium server</a>.</li></ul><div>For latest MyRocks 5.6.35 vs latest MyRocks 8.0.32</div><div><ul style="text-align: left;"><li>There might be large regressions for the range query tests (qr*). These might also be noise. I have more work in progress to figure that out. I don't see such a large regression <a href="https://smalldatum.blogspot.com/2023/08/checking-myrocks-56-for-regressions_10.html">on a medium server</a>.</li></ul></div><div><div><b>Build + Configuration</b><br /></div><p></p><p></p><p></p><div></div><p></p><div><div>I tested MyRocks 5.6.35, 8.0.28 and 8.0.32 using the latest code as of December 2023. I also repeated tests for older builds for MyRocks 5.6. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.</div><div><br /></div><div>MyRocks 5.6.35</div><div><ul style="text-align: left;"><li>fbmy5635_rel_221222</li><ul><li>compiled with gcc 11.4.0 from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>fbmy5635_rel_clang14_221222</li><ul><li>compiled with clang 14.0.0 from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>fbmy5635_rel_clang15_221222</li><ul><li>compiled with clang 15.0.7 from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b</li></ul></ul>MyRocks 8.0.28<br /><ul style="text-align: left;"><li>fbmy8028_rel_221222</li><ul><li>compiled with gcc 11.4.0 from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>fbmy8028_rel_clang14_221222</li><ul><li>compiled with clang 14.0.0 from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>fbmy8028_rel_clang15_221222</li><ul><li>compiled with clang 15.0.7 from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b</li></ul></ul>MyRocks 8.0.32<br /><ul style="text-align: left;"><li>fbmy8032_rel_221222</li><ul><li>compiled with gcc 11.4.0 from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>fbmy8032_rel_clang14_221222</li><ul><li>compiled with clang 14.0.0 from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>fbmy8032_rel_clang15_221222</li><ul><li>compiled with clang 15.0.7 from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b</li></ul></ul><div>The older MyRocks 5.6 builds are</div><div></div></div><div><ul><li>fbmy5635_rel_202104072149</li><ul><li>compiled from code as of 2021-04-07 at git hash f896415f with RocksDB 6.19.0</li></ul><li>fbmy5635_rel_202203072101</li><ul><li>compiled from code as of 2022-03-07 at git hash e7d976ee with RocksDB 6.28.2</li></ul><li>fbmy5635_rel_202205192101</li><ul><li>compiled from code as of 2022-05-19 at git hash d503bd77 with RocksDB 7.2.2</li></ul><li>fbmy5635_rel_202208092101</li><ul><li>compiled from code as of 2022-08-09 at git hash 877a0e58 with RocksDB 7.3.1</li></ul><li>fbmy5635_rel_202210112144</li><ul><li>compiled from code as of 2022-10-11 at git hash c691c716 with RocksDB 7.3.1</li></ul><li>fbmy5635_rel_202302162102</li><ul><li>compiled from code as of 2023-02-16 at git hash 21a2b0aa with RocksDB 7.10.0</li></ul><li>fbmy5635_rel_202304122154</li><ul><li>compiled from code as of 2023-04-12 at git hash 205c31dd with RocksDB 7.10.2</li></ul><li>fbmy5635_rel_202305292102</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.2.1</li></ul><li>fbmy5635_rel_20230529_832</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.3.2</li></ul><li>fbmy5635_rel_20230529_843</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.4.3</li></ul><li>fbmy5635_rel_20230529_850</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.5.0</li></ul></ul></div><div>Most tests used the cza1_bee my.cnf files that are here <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy56/etc/my.cnf.cza1_bee">for 5.6.35</a> and <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1_bee">for 8.0</a>. Some 8.0 tests used the cza1ps0_bee my.cnf file that disables the perf schema <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1ps0_bee">is here</a>.</div></div></div><div><br /></div><div><div><b>Benchmark</b></div><div> </div><div>The test server is a Beelink SER 4700u with 8 cores, 16G RAM, Ubuntu 22.04, XFS and 1 m.2 device. The benchmark is run with 1 clients to avoid over-subscribing the CPU.</div><div><br /></div><div>I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:</div><p></p><div><ul><li>l.i0</li><ul><li>insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for</div><div><ul><li><a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy56/all.html">MyRocks 5.6.35</a></li><li><a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy8028/all.html">MyRocks 8.0.28</a></li><li><a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy8032/all.html">MyRocks 8.0.32</a></li><li><a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy80/all.html">MyRocks 8.0</a></li><li><a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmyall/all.html">MyRocks 5.6 and 8.0</a></li></ul></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy56/all.html#summary">the summary</a> for 5.6.35</div></div></div></div></div><div><ul style="text-align: left;"><li>On l.x (index create) the clang 14/15 builds are slower, probably because there is a <a href="https://github.com/llvm/llvm-project/issues/55153">codegen perf bug</a> in clang that is fixed in more recent releases.</li><li>Not much changes for most benchmark steps, except for the qr* steps that do range queries. I don't know yet whether this is a real regression or noise.</li><li>Throughput in fbmy5635_rel_221222 relative to fbmy5635_rel_202104072149</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.96</span></li><li>l.x - relative QPS is <span style="background-color: #f4cccc;">0.97</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.98</span>, <span style="background-color: #f4cccc;">0.96</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.59</span>, <span style="background-color: #f4cccc;">0.53</span>, <span style="background-color: #f4cccc;">0.51</span> </li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.96</span>, <span style="background-color: #f4cccc;">0.99</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.99</span></li></ul></ul><div><div>From <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy8028/all.html#summary">the summary</a> for 8.0.28</div><div><ul><li>On l.x (index create) the clang 14/15 builds are slower, probably because there is a <a href="https://github.com/llvm/llvm-project/issues/55153">codegen perf bug</a> in clang that is fixed in more recent releases.</li><li>Results are mixed from the cza1ps0_bee my.cnf that disables the perf schema</li></ul><div><div>From <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy8032/all.html#summary">the summary</a> for 8.0.32</div><div><ul><li>On l.x (index create) the clang 14/15 builds are slower, probably because there is a <a href="https://github.com/llvm/llvm-project/issues/55153">codegen perf bug</a> in clang that is fixed in more recent releases.</li><li>Results are good from the cza1ps0_bee my.cnf that disables the perf schema</li></ul><div><div>From <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmy80/all.html#summary">the summary</a> for 8.0</div><div><ul style="text-align: left;"><li>I need to figure out whether the differences in the qr* steps that do range queries are noise or regressions. I suspect this is noise.</li><li>Throughput in fbmy8032_rel_221222 relative to fbmy8028_rel_221222</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.95</span></li><li>l.x - relative QPS is <span style="background-color: #f4cccc;">0.99</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.97</span>, <span style="background-color: #f4cccc;">0.97</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.80</span>, <span style="background-color: #f4cccc;">0.91</span>, <span style="background-color: #d9ead3;">1.16</span> </li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.95</span>, <span style="background-color: #f4cccc;">0.96</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.99</span></li></ul></ul><div><div>From <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.fbmyall/all.html#summary">the summary</a> for 5.6 and 8.0</div><div><ul><li>Throughput in fbmy8032_rel_221222 relative to fbmy5635_rel_221222</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.66</span></li><li>l.x - relative QPS is <span style="background-color: #f4cccc;">0.86</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.80</span>, <span style="background-color: #f4cccc;">0.78</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.59</span>, <span style="background-color: #f4cccc;">0.48</span>, <span style="background-color: #f4cccc;">0.51</span> </li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.84</span>, <span style="background-color: #f4cccc;">0.87</span><span style="background-color: white;">, </span><span style="background-color: #f4cccc;">0.89</span></li></ul></ul></div></div></div></div></div></div></div></div></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-28056286771438527452024-01-02T10:41:00.000-08:002024-01-02T14:44:01.389-08:00Updated Insert benchmark: Postgres 9.x to 16.x, small server, cached database<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> using Postgres versions 9.x through 16.x using a small server and cached workload. The benchmark code has been updated since my <a href="https://smalldatum.blogspot.com/2023/09/postgres-160-vs-insert-benchmark-on.html">last blog post</a> for PG vs the Insert Benchmark on small servers. I also included results for the latest point releases from Postgres versions 9.0, 9.1, 9.2, 9.3, 9.4, 9.5, 9.6 and 10. Because time is finite, I didn't include results from these versions in <a href="https://smalldatum.blogspot.com/2023/10/postgres-vs-mysql-impact-of-cpu.html">my post</a> about CPU performance regressions.</p><p>tl;dr</p><p></p><ul style="text-align: left;"><li>Comparing Postgres 16.1 to 9.0.23 all benchmark steps are faster in 16.1 except for point queries which are ~2% slower on one small server and ~10% slower on the other. This regression arrived in 9.6 and perf has been stable since then.</li><li>For write-heavy workloads there were regressions in the 9.X releases, but since then perf has been improving with a few exceptions (like PG 13).<br /></li><li>Perf for write-heavy workloads improved a lot starting in Postgres 9.5</li></ul><div><b>Build + Configuration</b><br /></div><p></p><p></p><div><div>I compiled Postgres from source for versions using <a href="https://github.com/mdcallag/mytools/blob/master/bench/build/mar23/beelink/pg152/mk.pg.def">this script</a>. The config files are linked below for the SER4 server. The configs for SER7 are the same except shared_buffers is increased from 10G to 23G. I tried to make them as similar as possible:</div><div><ul style="text-align: left;"><li>9.0.23 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg9/conf.diff.cx9a2_bee.90">config file</a></li><li>9.1.24 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg9/conf.diff.cx9a2_bee.90">config file</a></li><li>9.2.24 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg9/conf.diff.cx9a2_bee.92">config file</a></li><li>9.3.25 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg9/conf.diff.cx9a2_bee.92">config file</a></li><li>9.4.26 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg9/conf.diff.cx9a2_bee.94">config file</a></li><li>9.5.25 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg9/conf.diff.cx9a2_bee.95">config file</a></li><li>9.6.24 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg9/conf.diff.cx9a2_bee.96">config file</a></li><li>10.23 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg10/conf.diff.cx9a2_bee">config file</a></li><li>11.22 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg11/conf.diff.cx9a2_bee">config file</a></li><li>12.17 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg12/conf.diff.cx9a2_bee">config file</a></li><li>13.13 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg13/conf.diff.cx9a2_bee">config file</a></li><li>14.10 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg14/conf.diff.cx9a2_bee">config file</a></li><li>15.5 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg15/conf.diff.cx9a2_bee">config file</a></li><li>16.1 - <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/pg16/conf.diff.cx9a2_bee">config file</a></li></ul></div><div><b>Benchmark</b></div></div><div><br /></div><div>The benchmark was run with 1 client using my old and new small servers.</div><div><ul style="text-align: left;"><li>SER4 - The old small server is a a Beelink SER 4700u <a href="https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">described here</a> that has 8 cores, hyperthreads disabled, 16G RAM, Ubuntu 22.04 and XFS using an NVMe SSD. </li><li>SER7 - The new small server is a Beelink SER7 7840HS <a href="https://smalldatum.blogspot.com/2022/10/small-servers-for-performance-testing-v4.html">described here</a> that has 8 cores, hyperthreads disabled, 32G RAM, Ubuntu 22.04 and XFS using an NVMe SSD.</li></ul></div><div>I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:</div><div><p></p><div><ul><li>l.i0</li><ul><li>insert X million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client. X is 20M for SER4 and 40M for SER7.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1800 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul></div><p></p><div><b>Results</b></div><div><br /></div><div>The performance report is here <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.pg/all.html">for SER4</a> and <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.ser7.cached.pg/all.html">for SER7</a>. It has a lot more detail including charts, tables and metrics from iostat and vmstat to help explain the performance differences.</div><div><br /></div><div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.bee.cached.pg/all.html#summary">the summary</a> for SER4</div></div></div><div><ul style="text-align: left;"><li>The base case is Postgres 9.0.23</li><li>There are no regressions in Postgres 14, 15 & 16 relative to Postgres 9</li><li>There are regressions for some write-heavy benchmark steps from Postgres 9.0 to 9.6</li><li>Postgres 13.13 isn't great for write-heavy (see l.i2)</li><li>For read-heavy, modern Postgres is better at range queries than at point relative to older Postgres</li><li>Throughput per benchmark step in Postgres 16.1 relative to 9.0.23</li><ul><li>l.i0 - relative QPS is <span style="background-color: #d9ead3;">1.23</span></li><li>l.x - relative QPS is <span style="background-color: #d9ead3;">1.71</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">3.44</span>, <span style="background-color: #d9ead3;">2.18</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.16</span>, <span style="background-color: #d9ead3;">1.21</span>, <span style="background-color: #d9ead3;">1.27</span></li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.11</span>, <span style="background-color: #d9ead3;">1.03</span>, <span style="background-color: #f4cccc;">0.98</span></li></ul></ul></div><div>From <a href="https://mdcallag.github.io/reports/24_01_01.1u.1tno.ser7.cached.pg/all.html#summary">the summary</a> for SER7</div><div><ul><li>The base case is Postgres 9.0.23</li><li>There are small regressions for point queries in Postgres 14, 15 & 16 relative to Postgres 9</li><li>There are regressions for some write-heavy benchmark steps from Postgres 9.0 to 9.6</li><li>Postgres 13.13 isn't great for write-heavy (see l.i2)</li><li>For read-heavy, modern Postgres is better at range queries than at point relative to older Postgres</li><li>Throughput per benchmark step in Postgres 16.1 relative to 9.0.23</li><ul><li>l.i0 - relative QPS is <span style="background-color: #d9ead3;">1.53</span></li><li>l.x - relative QPS is <span style="background-color: #d9ead3;">1.69</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">4.05</span>, <span style="background-color: #d9ead3;">3.52</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.38</span>, <span style="background-color: #d9ead3;">1.61</span>, <span style="background-color: #d9ead3;">1.52</span></li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.01</span>, <span style="background-color: #f4cccc;">0.93</span>, <span style="background-color: #f4cccc;">0.88</span></li></ul></ul></div></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div><div><br /></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-59907496103233307922024-01-01T16:54:00.000-08:002024-01-02T14:44:07.335-08:00Updated Insert benchmark: MyRocks 5.6 and 8.0, medium/large server, cached database<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> with MyRocks 5.6.35, 8.0.28 and 8.0.32, a medium/large server and a cached workload. </p><p>tl;dr</p><ul style="text-align: left;"><li>For read-heavy benchmark steps disabling the perf schema improves performance by ~5%</li><li>There might be a small regression (~3%) for point queries from 8.0.28 to 8.0.32</li><li>Throughput in MyRocks 8.0.32 relative to 5.6.35 by benchmark step</li><ul><li>l.i0 - MyRocks 8.0.32 is <span style="background-color: #f4cccc;">~16% slower</span></li><li>l.x - MyRocks 8.0.32 is <span style="background-color: #d9ead3;">~3% faster</span></li><li>l.i1, l.i2 - MyRocks 8.0.32 is <span style="background-color: #d9ead3;">3%, 26% faster</span></li><li>range queries - MyRocks 8.0.32 is <span style="background-color: #d9ead3;">~15% faster</span> </li><li>point queries - MyRocks 8.0.32 is <span style="background-color: #f4cccc;">~4% slower</span></li></ul></ul><p><b>Small, medium, medium/large and large</b></p><p>I have been describing my test servers as small, medium and large and now I am using medium/large. What does this mean? I will wave my hand and make up definitions:</p><p></p><ul style="text-align: left;"><li>small - fewer than 10 CPU cores</li><li>medium - fewer than 20 CPU cores</li><li>medium/large - fewer than 30 CPU cores</li><li>large - at least 30 CPU cores</li></ul><p></p><div><b>Build + Configuration</b><br /></div><div><p></p><p></p><p></p><div></div><p></p><div><div>I tested MyRocks 5.6.35, 8.0.28 and 8.0.32 using the latest code as of December 2023. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.</div><div><br /></div><div>The versions tested were:</div><div><ul><li>MyRocks 5.6.35 (fbmy5635_rel_221222)</li><ul><li>compiled from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b</li><li>used the <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy56/etc/my.cnf.cza1_c24r64">cza1_c24r64</a> my.cnf file</li></ul><li>MyRocks 8.0.28 (fbmy8028_rel_221222)</li><ul><li>compiled from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b</li><li>used the <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1_c24r64">cza1_c24r64</a> and <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1ps0_c24r64">cza1ps0_c24r64</a> my.cnf files</li></ul><li>MyRocks 8.0.32 (fbmy8032_rel_221222)</li><ul><li>compiled from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b</li><li>used the <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1_c24r64">cza1_c24r64</a> and <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1ps0_c24r64">cza1ps0_c24r64</a> my.cnf files</li></ul></ul><div>The cza1_c24r64 and cza1ps0_r24c64 differ in one way -- cza1_c24r64 enables the perf schema while cza1ps0_c24r64 disables it.</div></div></div></div><div><br /></div><div><div><b>Benchmark</b></div><div> </div><div>The test server is a SuperMicro SuperWorkstation (Sys-7049A-T) with 2-sockets, 12 cores/socket, hyperthreads disabled, 64G RAM, Ubuntu 22.04 and XFS using a 2TB NVMe m.2 device. The benchmark is run with 12 clients to avoid over-subscribing the CPU. Next time I might use 16.</div><div><br /></div><div>I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:</div><p></p><div><ul><li>l.i0</li><ul><li>insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1200 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance report <a href="https://mdcallag.github.io/reports/24_01_01.12u.1tno.socket2.cached.fbmy/all.html">is here</a>.</div><div><br /></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>From <a href="https://mdcallag.github.io/reports/24_01_01.12u.1tno.socket2.cached.fbmy/all.html#summary">the summary</a></div></div><div><ul><li>The base case is fbmy5635_rel_221222</li><li>For the read-heavy benchmark steps disabling the perf schema improves performance by ~5%</li><li>There might be a small regression (~3%) for point queries from 8.0.28 to 8.0.32</li><li>Throughput in fbmy8032_rel_221222 relative to the base case</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.84</span></li><li>l.x - relative QPS is <span style="background-color: #d9ead3;">1.03</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #d9ead3;">1.03</span>, <span style="background-color: #d9ead3;">1.26</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.16</span>, <span style="background-color: #d9ead3;">1.13</span>, <span style="background-color: #d9ead3;">1.18</span> </li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.96</span>, <span style="background-color: #f4cccc;">0.96</span>, <span style="background-color: #f4cccc;">0.97</span></li></ul></ul></div><div><ul></ul></div></div></div></div><p><br /></p>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0tag:blogger.com,1999:blog-9149523927864751087.post-18581953707287892332024-01-01T15:10:00.000-08:002024-01-02T14:43:53.683-08:00Updated Insert benchmark: MyRocks 5.6 and 8.0, medium server, cached database<p>This has results for the <a href="https://smalldatum.blogspot.com/2023/12/updates-for-insert-benchmark-december.html">Insert Benchmark</a> using MyRocks 5.6 and 8.0 using a medium server and cached workload. This is my first report that includes MyRocks 8.0.32.</p>For old MyRocks 5.6.35 vs latest 5.6.35<br /><ul style="text-align: left;"><li>Throughput is similar except for range queries where there might be a small regression of ~7%</li></ul><div>For latest MyRocks 8.0.28 vs latest MyRocks 8.0.32</div><div><ul style="text-align: left;"><li>Throughput is is similar but there might be a small regression for point queries of ~5%</li></ul></div><div>For latest MyRocks 5.6.35 vs latest MyRocks 8.0.32</div><div><ul style="text-align: left;"><li>Throughput in 8.0.32 is worse for write-heavy and better for read-heavy</li><li>For write-heavy the difference is <= 3% for l.x, l.i1, l.i2 and ~18% for l.i0</li><li>For read-heavy the difference is between 5% and 9%</li></ul></div><div><b>Build + Configuration</b><br /></div><p></p><p></p><p></p><div></div><p></p><div><div>I tested MyRocks 5.6.35, 8.0.28 and 8.0.32 using the latest code as of December 2023. I also repeated tests for older builds for MyRocks 5.6. These were compiled from source. All builds use CMAKE_BUILD_TYPE =Release.</div><div><br /></div><div>For the builds with the latest version of MyRocks I used</div><div><ul><li>MyRocks 5.6.35 (fbmy5635_rel_221222)</li><ul><li>compiled from git hash 4f3a57a1, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>MyRocks 8.0.28 (fbmy8028_rel_221222)</li><ul><li>compiled from git hash 2ad105fc, RocksDB 8.7.0 at git hash 29005f0b</li></ul><li>MyRocks 8.0.32 (fbmy8032_rel_221222)</li><ul><li>compiled from git hash 76707b44, RocksDB 8.7.0 at git hash 29005f0b</li></ul></ul><div>The older MyRocks 5.6 builds are</div><div></div></div><div><ul style="text-align: left;"><li>fbmy5635_rel_202104072149</li><ul><li>compiled from code as of 2021-04-07 at git hash f896415f with RocksDB 6.19.0</li></ul><li>fbmy5635_rel_202203072101</li><ul><li>compiled from code as of 2022-03-07 at git hash e7d976ee with RocksDB 6.28.2</li></ul><li>fbmy5635_rel_202205192101</li><ul><li>compiled from code as of 2022-05-19 at git hash d503bd77 with RocksDB 7.2.2</li></ul><li>fbmy5635_rel_202208092101</li><ul><li>compiled from code as of 2022-08-09 at git hash 877a0e58 with RocksDB 7.3.1</li></ul><li>fbmy5635_rel_202210112144</li><ul><li>compiled from code as of 2022-10-11 at git hash c691c716 with RocksDB 7.3.1</li></ul><li>fbmy5635_rel_202302162102</li><ul><li>compiled from code as of 2023-02-16 at git hash 21a2b0aa with RocksDB 7.10.0</li></ul><li>fbmy5635_rel_202304122154</li><ul><li>compiled from code as of 2023-04-12 at git hash 205c31dd with RocksDB 7.10.2</li></ul><li>fbmy5635_rel_202305292102</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.2.1</li></ul><li>fbmy5635_rel_20230529_832</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.3.2</li></ul><li>fbmy5635_rel_20230529_843</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.4.3</li></ul><li>fbmy5635_rel_20230529_850</li><ul><li>compiled from code as of 2023-05-29 at git hash b739eac1 with RocksDB 8.5.0</li></ul></ul></div><div>Most tests used the cza1_gcp_c2s30 my.cnf files that are here <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy56/etc/my.cnf.cza1_gcp_c2s30">for 5.6.35</a> and <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1_gcp_c2s30">for 8.0</a>. Some 8.0 tests used the cza1ps0_gcp_c2s30 my.cnf file that disables the perf schema and <a href="https://github.com/mdcallag/mytools/blob/master/bench/conf/nuc8i7.ub1804/fbmy80/etc/my.cnf.cza1ps0_gcp_c2s30">is here</a>.</div><div><br /></div></div><div><div><b>Benchmark</b></div><div> </div><div>The test server is a c2-standard-30 from GCP with 15 cores, hyperthreads disabled, 128G of RAM, Ubuntu 22.04 and XFS on SW RAID 0 over 4 local SSD. The benchmark is run with 8 clients to avoid over-subscribing the CPU.</div><div><br /></div><div>I used the updated Insert Benchmark so there are more benchmark steps described below. In order, the benchmark steps are:</div><p></p><div><ul><li>l.i0</li><ul><li>insert 20 million rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.</li></ul><li>l.x</li><ul><li>create 3 secondary indexes per table. There is one connection per client.</li></ul><li>l.i1</li><ul><li>use 2 connections/client. One inserts 50M rows and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.</li></ul><li>l.i2</li><ul><li>like l.i1 but each transaction modifies 5 rows (small transactions).</li></ul><li>qr100</li><ul><li>use 3 connections/client. One does range queries for 1200 seconds and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. This step is run for a fixed amount of time. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested.</li></ul><li>qp100</li><ul><li>like qr100 except uses point queries on the PK index</li></ul><li>qr500</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qp500</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 500/s</li></ul><li>qr1000</li><ul><li>like qr100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul><li>qp1000</li><ul><li>like qp100 but the insert and delete rates are increased from 100/s to 1000/s</li></ul></ul><div><div><b>Results</b></div><div><br /></div><div>The performance reports are here for</div><div><ul style="text-align: left;"><li><a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy56/all.html">MyRocks 5.6</a> </li><li><a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy80/all.html">MyRocks 8.0</a></li><li><a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy_all/all.html">MyRocks 5.6 & 8.0</a> with many 5.6 versions</li><li><a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy_latest/all.html">MyRocks 5.6 & 8.0</a> with the latest versions</li></ul></div><div>The summary has 3 tables. The first shows absolute throughput by DBMS tested X benchmark step. The second has throughput relative to the version on the first row of the table. The third shows the background insert rate for benchmark steps with background inserts and all systems sustained the target rates. The second table makes it easy to see how performance changes over time.</div><div><br /></div><div>Below I use relative QPS to explain how performance changes. It is: (QPS for $me / QPS for $base) where $me is my version and $base is the version of the base case. When relative QPS is > 1.0 then performance improved over time. When it is < 1.0 then there are regressions. The Q in relative QPS measures: </div><div><ul><li>insert/s for l.i0, l.i1, l.i2</li><li>indexed rows/s for l.x</li><li>range queries/s for qr100, qr500, qr1000</li><li>point queries/s for qp100, qp500, qp1000</li></ul><div>From the summary <a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy56/all.html#summary">for 5.6</a></div></div><div><ul style="text-align: left;"><li>The base case is fbmy5635_rel_202104072149</li><li>Throughput in fbmy5635_rel_221222 is similar to the base case, except for range queries where there might be a small regression of ~7%</li><ul><li>l.i0 - relative QPS is <span style="background-color: #d9ead3;">1.01</span></li><li>l.x - relative QPS is <span style="background-color: #f4cccc;">0.96</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.96</span>, <span style="background-color: #d9ead3;">1.01</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.93</span>, <span style="background-color: #f4cccc;">0.92</span>, <span style="background-color: #f4cccc;">0.99</span> </li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.98</span>, <span style="background-color: #d9ead3;">1.03</span>, <span style="background-color: #d9ead3;">1.01</span></li></ul></ul></div><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy80/all.html#summary">for 8.0</a></div><div><ul style="text-align: left;"><li>The base case is fbmy8028_rel_221222</li><li>Results in MyRocks 8.0.32 with the performance schema disabled are mixed</li><li>Throughput in fbmy8032_rel_221222 is mostly similar to the base case. There might be a small regression for point queries.</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.94</span></li><li>l.x - relative QPS is <span style="background-color: #d9ead3;">1.02</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.99</span>, <span style="background-color: #f4cccc;">0.97</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.05</span>, <span style="background-color: #d9ead3;">1.00</span>, <span style="background-color: #d9ead3;">1.02</span></li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #f4cccc;">0.96</span>, <span style="background-color: #f4cccc;">0.96</span>, <span style="background-color: #f4cccc;">0.95</span></li></ul></ul></div></div><div><div>From the summary <a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy_all/all.html#summary">5.6, 8.0</a> with many versions:</div><div><ul><li>The base case is fbmy5635_rel_202104072149</li><li>Throughput in fbmy8032_rel_221222 relative to the base case is worse for write-heavy and better for read-heavy</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.83</span></li><li>l.x - relative QPS is <span style="background-color: #f4cccc;">0.93</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.94</span>, <span style="background-color: #f4cccc;">0.97</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #f4cccc;">0.98</span>, <span style="background-color: #d9ead3;">1.07</span>, <span style="background-color: #d9ead3;">1.08</span> </li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.05</span>, <span style="background-color: #d9ead3;">1.10</span>, <span style="background-color: #d9ead3;">1.08</span></li></ul></ul><div>From the summary for <a href="https://mdcallag.github.io/reports/24_01_01.8u.1tno.c2.cached.fbmy_latest/all.html#summary">5.6, 8.0</a> with latest versions</div><div><ul style="text-align: left;"><li>The base case is fbmy5635_rel_221222</li><li>Throughput in fbmy8032_rel_221222 relative to the base case is worse for write-heavy and better for read-heavy</li><ul><li>l.i0 - relative QPS is <span style="background-color: #f4cccc;">0.82</span></li><li>l.x - relative QPS is <span style="background-color: #f4cccc;">0.97</span></li><li>l.i1, l.i2 - relative QPS is <span style="background-color: #f4cccc;">0.98</span>, <span style="background-color: #f4cccc;">0.97</span></li><li>qr100, qr500, qr1000 - relative QPS is <span style="background-color: #d9ead3;">1.05</span>, <span style="background-color: #d9ead3;">1.16</span>, <span style="background-color: #d9ead3;">1.09</span> </li><li>qp100, qp500, qp1000 - relative QPS is <span style="background-color: #d9ead3;">1.07</span>, <span style="background-color: #d9ead3;">1.07</span>, <span style="background-color: #d9ead3;">1.07</span></li></ul></ul></div></div></div></div></div></div>Mark Callaghanhttp://www.blogger.com/profile/09590445221922043181noreply@blogger.com0