Thursday, August 8, 2024

MySQL regressions: delete vs InnoDB

I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now.

I created flamegraphs while running sysbench with cached databases (more likely to be CPU bound) and the problem here occurs on an 8-core PN53 where sysbench was run with 1 thread. Here I use perf record -e cycles to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead.

The numbers below are the percentage of perf samples from a function and its callees and this is a proxy for CPU time. Here I used cycles as the HW counter with perf but I have more tests in progress to get flamegraphs with other counters. The flamegraphs are here.

I am curious about whether more data is written to the binlog per transaction with 8.0.37, but I have to repeat tests to get that data. While my helper scripts archive results from SHOW GLOBAL STATUS which provides the Binlog_bytes_written counter, that counter is only in MariaDB and Percona Server. For upstream MySQL I have to use this much more fun query to get the answer from the perf schema.
select SUM_NUMBER_OF_BYTES_WRITE from file_summary_by_event_name where EVENT_NAME="wait/io/file/sql/binlog"
Time is spent in background and foreground work. The background work are mostly purge and writeback. The foreground work is networking, SQL and binlog. At a high level the percentage of time in SQL + binlog is about 2X larger with MySQL 8.0.37 vs 5.6.51 from ~30% to ~60%. It will take time to understand why.

5.6.51
---
17.77   mysql_delete
   0.35   QUICK_RANGE_SELECT::reset
   1.69   SQL_SELECT::test_quick_select
   0.30   THD::binlog_query
   1.40   binlog_log_row
   4.52   handler::ha_delete_row
   0.60   lock_tables
   1.16   mysql_prepare_delete
   1.28   open_normal_and_derived_tables
   0.73   optimize_cond
   4.06   rr_quick

 9.82   ha_commit_trans
   5.61   MYSQL_BIN_LOG::commit
     1.83   MYSQL_BIN_LOG::process_commit_stage_queue
     0.86   MYSQL_BIN_LOG::process_flush_stage_queue
     2.26   my_b_flush_io_cache
 3.78   ha_prepare_low

5.7.44
---
36.79   Sql_cmd_delete::execute
   0.73   QUICK_RANGE_SELECT::reset
   1.58   Sql_cmd_delete::mysql_prepare_delete
   0.60   THD::binlog_query
   3.33   binlog_log_row
   9.59   handler::ha_delete_row
   1.47   lock_tables
   2.09   open_tables_for_query
   1.03   optimize_cond
   8.65   rr_quick
   3.45   test_quick_select

18.04   ha_commit_trans
  16.28   MYSQL_BIN_LOG::commit
     3.81   MYSQL_BIN_LOG::flush_cache_to_file
     4.22   MYSQL_BIN_LOG::process_commit_stage_queue
     6.67   MYSQL_BIN_LOG::process_flush_stage_queue
       1.70   MYSQL_BIN_LOG::flush_thread_caches
       4.58   ha_flush_logs
   0.99   ha_prepare_low

8.0.37
---
37.60   Sql_cmd_dml::execute
  28.57   Sql_cmd_delete::delete_from_single_table
     6.80   IndexRangeScanIterator::Read
     0.83   THD::binlog_query
     6.89   binlog_log_row
     6.32   handler::ha_delete_row
     1.24   optimize_cond
     3.27   test_quick_select
   0.73   Sql_cmd_delete::check_privileges
   0.81   Sql_cmd_delete::restore_cmd_properties
   1.83   lock_tables
   4.04   open_tables_for_query

20.26   ha_commit_trans
  17.41   MYSQL_BIN_LOG::commit
     0.75   MYSQL_BIN_LOG::change_stage
     3.09   MYSQL_BIN_LOG::flush_cache_to_file
     3.29   MYSQL_BIN_LOG::process_commit_stage_queue
     8.37   MYSQL_BIN_LOG::process_flush_stage_queue
       5.42   MYSQL_BIN_LOG::fetch_and_process_flush_stage_queue
       2.40   MYSQL_BIN_LOG::flush_thread_caches
   1.37   MYSQL_BIN_LOG::prepare

5 comments:

  1. Jean-François GagnéAugust 9, 2024 at 10:20 AM

    Something new in binlogs in 8.0 is `transaction_write_set_extraction = XXHASH64` (it is OFF in 5.7 and does not exist in 5.6). An interesting test would be to set to OFF in 8.0 to see if this impacts performance significantly.

    Something else is `binlog-transaction-dependency-tracking`. This is exposed as a parameter in 8.0, always doing `COMMIT_ORDER` in 5.7, and inexistent in 5.6. There is no `OFF` option in 8.0. Unclear what is the overhead of this. If this is something of interest, we could check the overhead of this (one way could be to implement a `OFF` option for this parameter in 8.0 and re-run the tests).

    ReplyDelete
    Replies
    1. Starting more tests now including one that disables transaction_write_set_extraction

      Delete
    2. Repeated the test on a small server using two no configs:
      z11a_nops_bee -> disables perf schema
      z11a_twse_bee -> adds transaction_write_set_extraction=OFF

      Neither solves the problem:
      z11a_nops_bee -> improves QPS by about 5% in most cases
      z11a_twse_bee -> improves QPS by about 2% for some write heavy tests

      Results: https://gist.github.com/mdcallag/6d8d2b282e1052f0ba99027e700db154

      Results: https://gist.github.com/mdcallag/6d8d2b282e1052f0ba99027e700db154

      Delete
  2. Jean-François GagnéAugust 9, 2024 at 10:25 AM

    Re binlogs: is sysbench generating transaction bigger than binlog_cache_size ? If yes, growing the cache might be interesting to explore to avoid the sys-call associated with spilling to tmp file (and reading this tmp file when. writing binlog).
    https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_cache_size

    ReplyDelete
    Replies
    1. I confirmed that Binlog_cache_disk_use doesn't increase after the initial load so I don't think I should increase the size of the binlog cache

      Delete

Speedb vs RocksDB on a large server

I am happy to read about storage engines that claim to be faster than RocksDB. Sometimes the claims are true and might lead to ideas for mak...