I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now.
I created flamegraphs while running sysbench with cached databases (more likely to be CPU bound) and the problem here occurs on an 8-core PN53 where sysbench was run with 1 thread. Here I use perf record -e cycles to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead.The numbers below are the percentage of perf samples from a function and its callees and this is a proxy for CPU time. Here I used cycles as the HW counter with perf but I have more tests in progress to get flamegraphs with other counters. The flamegraphs are here.
I am curious about whether more data is written to the binlog per transaction with 8.0.37, but I have to repeat tests to get that data. While my helper scripts archive results from SHOW GLOBAL STATUS which provides the Binlog_bytes_written counter, that counter is only in MariaDB and Percona Server. For upstream MySQL I have to use this much more fun query to get the answer from the perf schema.
I am curious about whether more data is written to the binlog per transaction with 8.0.37, but I have to repeat tests to get that data. While my helper scripts archive results from SHOW GLOBAL STATUS which provides the Binlog_bytes_written counter, that counter is only in MariaDB and Percona Server. For upstream MySQL I have to use this much more fun query to get the answer from the perf schema.
select SUM_NUMBER_OF_BYTES_WRITE from file_summary_by_event_name where EVENT_NAME="wait/io/file/sql/binlog"
Time is spent in background and foreground work. The background work are mostly purge and writeback. The foreground work is networking, SQL and binlog. At a high level the percentage of time in SQL + binlog is about 2X larger with MySQL 8.0.37 vs 5.6.51 from ~30% to ~60%. It will take time to understand why.
5.6.51
---
17.77 mysql_delete
0.35 QUICK_RANGE_SELECT::reset
1.69 SQL_SELECT::test_quick_select
0.30 THD::binlog_query
1.40 binlog_log_row
4.52 handler::ha_delete_row
0.60 lock_tables
1.16 mysql_prepare_delete
1.28 open_normal_and_derived_tables
0.73 optimize_cond
4.06 rr_quick
9.82 ha_commit_trans
5.61 MYSQL_BIN_LOG::commit
1.83 MYSQL_BIN_LOG::process_commit_stage_queue
0.86 MYSQL_BIN_LOG::process_flush_stage_queue
2.26 my_b_flush_io_cache
3.78 ha_prepare_low
5.7.44
---
36.79 Sql_cmd_delete::execute
0.73 QUICK_RANGE_SELECT::reset
1.58 Sql_cmd_delete::mysql_prepare_delete
0.60 THD::binlog_query
3.33 binlog_log_row
9.59 handler::ha_delete_row
1.47 lock_tables
2.09 open_tables_for_query
1.03 optimize_cond
8.65 rr_quick
3.45 test_quick_select
18.04 ha_commit_trans
16.28 MYSQL_BIN_LOG::commit
3.81 MYSQL_BIN_LOG::flush_cache_to_file
4.22 MYSQL_BIN_LOG::process_commit_stage_queue
6.67 MYSQL_BIN_LOG::process_flush_stage_queue
1.70 MYSQL_BIN_LOG::flush_thread_caches
4.58 ha_flush_logs
0.99 ha_prepare_low
8.0.37
---
37.60 Sql_cmd_dml::execute
28.57 Sql_cmd_delete::delete_from_single_table
6.80 IndexRangeScanIterator::Read
0.83 THD::binlog_query
6.89 binlog_log_row
6.32 handler::ha_delete_row
1.24 optimize_cond
3.27 test_quick_select
0.73 Sql_cmd_delete::check_privileges
0.81 Sql_cmd_delete::restore_cmd_properties
1.83 lock_tables
4.04 open_tables_for_query
20.26 ha_commit_trans
17.41 MYSQL_BIN_LOG::commit
0.75 MYSQL_BIN_LOG::change_stage
3.09 MYSQL_BIN_LOG::flush_cache_to_file
3.29 MYSQL_BIN_LOG::process_commit_stage_queue
8.37 MYSQL_BIN_LOG::process_flush_stage_queue
5.42 MYSQL_BIN_LOG::fetch_and_process_flush_stage_queue
2.40 MYSQL_BIN_LOG::flush_thread_caches
1.37 MYSQL_BIN_LOG::prepare
Something new in binlogs in 8.0 is `transaction_write_set_extraction = XXHASH64` (it is OFF in 5.7 and does not exist in 5.6). An interesting test would be to set to OFF in 8.0 to see if this impacts performance significantly.
ReplyDeleteSomething else is `binlog-transaction-dependency-tracking`. This is exposed as a parameter in 8.0, always doing `COMMIT_ORDER` in 5.7, and inexistent in 5.6. There is no `OFF` option in 8.0. Unclear what is the overhead of this. If this is something of interest, we could check the overhead of this (one way could be to implement a `OFF` option for this parameter in 8.0 and re-run the tests).
Starting more tests now including one that disables transaction_write_set_extraction
DeleteRepeated the test on a small server using two no configs:
Deletez11a_nops_bee -> disables perf schema
z11a_twse_bee -> adds transaction_write_set_extraction=OFF
Neither solves the problem:
z11a_nops_bee -> improves QPS by about 5% in most cases
z11a_twse_bee -> improves QPS by about 2% for some write heavy tests
Results: https://gist.github.com/mdcallag/6d8d2b282e1052f0ba99027e700db154
Results: https://gist.github.com/mdcallag/6d8d2b282e1052f0ba99027e700db154
Re binlogs: is sysbench generating transaction bigger than binlog_cache_size ? If yes, growing the cache might be interesting to explore to avoid the sys-call associated with spilling to tmp file (and reading this tmp file when. writing binlog).
ReplyDeletehttps://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_cache_size
I confirmed that Binlog_cache_disk_use doesn't increase after the initial load so I don't think I should increase the size of the binlog cache
Delete