Thursday, August 8, 2024

MySQL regressions: delete vs InnoDB

I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now.

I created flamegraphs while running sysbench with cached databases (more likely to be CPU bound) and the problem here occurs on an 8-core PN53 where sysbench was run with 1 thread. Here I use perf record -e cycles to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead.

The numbers below are the percentage of perf samples from a function and its callees and this is a proxy for CPU time. Here I used cycles as the HW counter with perf but I have more tests in progress to get flamegraphs with other counters. The flamegraphs are here.

I am curious about whether more data is written to the binlog per transaction with 8.0.37, but I have to repeat tests to get that data. While my helper scripts archive results from SHOW GLOBAL STATUS which provides the Binlog_bytes_written counter, that counter is only in MariaDB and Percona Server. For upstream MySQL I have to use this much more fun query to get the answer from the perf schema.
select SUM_NUMBER_OF_BYTES_WRITE from file_summary_by_event_name where EVENT_NAME="wait/io/file/sql/binlog"
Time is spent in background and foreground work. The background work are mostly purge and writeback. The foreground work is networking, SQL and binlog. At a high level the percentage of time in SQL + binlog is about 2X larger with MySQL 8.0.37 vs 5.6.51 from ~30% to ~60%. It will take time to understand why.

5.6.51
---
17.77   mysql_delete
   0.35   QUICK_RANGE_SELECT::reset
   1.69   SQL_SELECT::test_quick_select
   0.30   THD::binlog_query
   1.40   binlog_log_row
   4.52   handler::ha_delete_row
   0.60   lock_tables
   1.16   mysql_prepare_delete
   1.28   open_normal_and_derived_tables
   0.73   optimize_cond
   4.06   rr_quick

 9.82   ha_commit_trans
   5.61   MYSQL_BIN_LOG::commit
     1.83   MYSQL_BIN_LOG::process_commit_stage_queue
     0.86   MYSQL_BIN_LOG::process_flush_stage_queue
     2.26   my_b_flush_io_cache
 3.78   ha_prepare_low

5.7.44
---
36.79   Sql_cmd_delete::execute
   0.73   QUICK_RANGE_SELECT::reset
   1.58   Sql_cmd_delete::mysql_prepare_delete
   0.60   THD::binlog_query
   3.33   binlog_log_row
   9.59   handler::ha_delete_row
   1.47   lock_tables
   2.09   open_tables_for_query
   1.03   optimize_cond
   8.65   rr_quick
   3.45   test_quick_select

18.04   ha_commit_trans
  16.28   MYSQL_BIN_LOG::commit
     3.81   MYSQL_BIN_LOG::flush_cache_to_file
     4.22   MYSQL_BIN_LOG::process_commit_stage_queue
     6.67   MYSQL_BIN_LOG::process_flush_stage_queue
       1.70   MYSQL_BIN_LOG::flush_thread_caches
       4.58   ha_flush_logs
   0.99   ha_prepare_low

8.0.37
---
37.60   Sql_cmd_dml::execute
  28.57   Sql_cmd_delete::delete_from_single_table
     6.80   IndexRangeScanIterator::Read
     0.83   THD::binlog_query
     6.89   binlog_log_row
     6.32   handler::ha_delete_row
     1.24   optimize_cond
     3.27   test_quick_select
   0.73   Sql_cmd_delete::check_privileges
   0.81   Sql_cmd_delete::restore_cmd_properties
   1.83   lock_tables
   4.04   open_tables_for_query

20.26   ha_commit_trans
  17.41   MYSQL_BIN_LOG::commit
     0.75   MYSQL_BIN_LOG::change_stage
     3.09   MYSQL_BIN_LOG::flush_cache_to_file
     3.29   MYSQL_BIN_LOG::process_commit_stage_queue
     8.37   MYSQL_BIN_LOG::process_flush_stage_queue
       5.42   MYSQL_BIN_LOG::fetch_and_process_flush_stage_queue
       2.40   MYSQL_BIN_LOG::flush_thread_caches
   1.37   MYSQL_BIN_LOG::prepare

5 comments:

  1. Jean-François GagnéAugust 9, 2024 at 10:20 AM

    Something new in binlogs in 8.0 is `transaction_write_set_extraction = XXHASH64` (it is OFF in 5.7 and does not exist in 5.6). An interesting test would be to set to OFF in 8.0 to see if this impacts performance significantly.

    Something else is `binlog-transaction-dependency-tracking`. This is exposed as a parameter in 8.0, always doing `COMMIT_ORDER` in 5.7, and inexistent in 5.6. There is no `OFF` option in 8.0. Unclear what is the overhead of this. If this is something of interest, we could check the overhead of this (one way could be to implement a `OFF` option for this parameter in 8.0 and re-run the tests).

    ReplyDelete
    Replies
    1. Starting more tests now including one that disables transaction_write_set_extraction

      Delete
    2. Repeated the test on a small server using two no configs:
      z11a_nops_bee -> disables perf schema
      z11a_twse_bee -> adds transaction_write_set_extraction=OFF

      Neither solves the problem:
      z11a_nops_bee -> improves QPS by about 5% in most cases
      z11a_twse_bee -> improves QPS by about 2% for some write heavy tests

      Results: https://gist.github.com/mdcallag/6d8d2b282e1052f0ba99027e700db154

      Results: https://gist.github.com/mdcallag/6d8d2b282e1052f0ba99027e700db154

      Delete
  2. Jean-François GagnéAugust 9, 2024 at 10:25 AM

    Re binlogs: is sysbench generating transaction bigger than binlog_cache_size ? If yes, growing the cache might be interesting to explore to avoid the sys-call associated with spilling to tmp file (and reading this tmp file when. writing binlog).
    https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_cache_size

    ReplyDelete
    Replies
    1. I confirmed that Binlog_cache_disk_use doesn't increase after the initial load so I don't think I should increase the size of the binlog cache

      Delete

Battle of the Mallocators

If you use RocksDB and want to avoid OOM then use jemalloc or tcmalloc and avoid glibc malloc. That was true in 2015 and remains true in 202...