Posts

Showing posts from June, 2024

The impact of link time optimization for MySQL with sysbench

Image
This post has results to show the benefit from using link time optimization for MySQL. That is enabled via the CMake option -DWITH_LTO=ON. tl;dr A typical improvement is ~5% more QPS from link time optimization On the small servers (PN53, SER4) the benefit from link-time optimization was larger for InnoDB than for MyRocks. On the medium server (C2D) the benefit was similar for MyRocks and InnoDB. Builds I used InnoDB from MySQL 8.0.37 and MyRocks from FB MySQL compiled on git sha 65644b82c which uses RocksDB 9.3.1 and was latest as of June 12, 2024. The compiler was gcc 11.4.0. Hardware I tested on three servers: SER4 - Beelink SER 4700u ( see here ) with 8 cores and a Ryzen 7 4700u CPU   PN53 - ASUS ExpertCenter PN53 ( see here ) with 8 cores and an AMD Ryzen 7 7735HS CPU C2D - a c2d-highcpu-32 instance type on GCP (c2d high-CPU) with 32 vCPU and SMT disabled so there are 16 cores All servers use Ubuntu 22.04 with ext4.  Benchmark I used sysbench and my usage is  explained here . The

A simple test to measure CPU per IO

What should I expect with respect to CPU overhead and latency when using the public cloud. I won't name the vendor here because they might have a DeWitt Clause. Hardware My server has 16 real cores, HT or SMT disabled, Ubuntu 22.04 and ext4 is used in all cases. The two IO setups tested are: local - 2 NVMe devices with SW RAID 0 network - 1TB of fast cloud block storage that is backed by SSD and advertised as being targeted for database workloads. Updates: Fixed a silly mistake in the math for CPU usecs per block read Benchmark This uses fio with O_DIRECT to do 4kb block reads. My benchmark script is here  it is run by the following command lines and I ignore the result of the first run: for d in 8 16 32 ; do bash run.sh local2_iod${d} /data/m/t.fio io_uring $d 300 512G ; done for d in 4 8 16 32 ; do bash run.sh network_iod${d} /data2/t.fio io_uring $d 300 900G ; done Results I compute CPU usecs as: (((vmstat.us + vmstat.sy)/100) * 16 * 1M) / IOPs where vmstat.us, vmstat.sy - the

The Insert Benchmark: Postgres 17beta1, large server, IO-bound

This post has results for the  Insert Benchmark  on a large server with an IO-bound workload. The goal is to compare new Postgres releases with older ones to determine whether get better or worse over time. The results here are from a large server (32 cores, 128G RAM). Results from the same setup with a cached workload  are here . This work was done by  Small Datum LLC . tl;dr There are no regressions from Postgres 16.3 to 17beta1 for this benchmark The patch to enforce VISITED_PAGES_LIMIT during get_actual_variable_range fixes the problem with variance from optimizer CPU overhead during DELETE statements, just as it did on a small server and also on a large server with a cached workload. And 17beta1 with the patch gets ~12X more writes/s than without the patch. This is my first result with ext4. I had to switch because XFS with the 6.5 kernel (HWE enabled, Ubuntu 22.04) don't play great together for me Build + Configuration This post has results from Postgres versions 10.23, 11.