MySQL regressions: skip_concurrency_ticket

I started to look at CPU overheads in MyRocks and upstream InnoDB. While I am happy to file bugs for MyRocks as they are likely to be fixed, I am not sure how much energy I want to put into proper bug reports for upstream InnoDB. So I will just write blog posts about them for now.

I created flamegraphs while running sysbench with cached databases (more likely to be CPU bound) and the problem here occurs on an 8-core PN53 where sysbench was run with 1 thread. Here I use perf record -e cycles to collect data for flamegraphs and then I focus on the percentage of samples in a given function (and its callees) as a proxy for CPU overhead.

The problem here is that during the scan benchymark the skip_concurrency_ticket function accounts for ~3% of CPU in 8.0.37, half that in 5.7.44 and the function doesn't exist in 5.6.51. It is called from innobase_srv_conc_enter_innodb which was a bit simpler in 5.6.

The flamegraphs (*.svg files) are here.

Also visible in those flamegraphs, the percentage of samples (CPU overhead prox) accounted for by row_sel_store_mysql_rec and callees
  • 27.20% in 5.6.51
  • 25.46% in 5.7.44
  • 31.17% in 8.0.28
  • 34.99% in 8.0.37

Comments

  1. Can you do perf annotate row_prebuilt_t::skip_concurrency_ticket, please?
    This function seems to do just `mov`s and `test`s and `cmp*`s - which looks like one would expect from reading the C++ code which is just a bunch of ifs on various fields. Perhaps the problem is that they are difficult to predict?
    Maybe it would help to reorder the ifs and so that we get to the answer quicker?
    Can you try sampling based on mispredictions (`perf record -e branch-misses..`)?
    Alternatively, maybe the problem is with fetching one of these values from ram somehow? Can you try sampling based on cache misses (`perf record -e cache-misses...`)?

    ReplyDelete
  2. (posting again as I can't see my previous post)
    Can you please do `perf annotate row_prebuilt_t::skip_concurrency_ticket?
    This function compiles to assembly which looks just like one would expect from C++ code: a bunch of `mov`s and `cmp*`/`test`s.
    So, perhaps the problem is with branch prediction or cache misses.
    You can try sampling with:
    perf report -e branch-misses ...
    perf report -e cache-misses ...
    One thing to try is to change the order of ifs so that we get to the most probable return quicker.

    ReplyDelete
    Replies
    1. Comments don't show up until I approve them because I don't want spam here, and I get more than I expect.

      I assume that stalls from cache and TLB are a big part of the problem. I am repeating tests to collect flame graphs for all of the supported HW counters. Alas, that will take time to finish.

      cache-references \
      cache-misses \
      branches \
      branch-misses \
      L1-dcache-loads \
      L1-dcache-load-misses \
      L1-icache-loads-misses \
      dTLB-loads \
      dTLB_load-misses \
      iTLB-load-misses \
      iTLB-loads \
      instructions \

      Delete
    2. I don't have the files needed to run "perf annotate". My helper scripts remove files after creating flamegraphs to save on disk space. Even with the removals it takes ~500M per experiment (experiment == run benchmark for one DBMS).

      I will keep this in mind (trying "perf annotate") for later.

      Delete

Post a Comment

Popular posts from this blog

Fixing bug 109595 makes MySQL almost 4X faster on the Insert Benchmark

Postgres versions 11, 12, 13, 14, 15, and 16 vs sysbench with a medium server

Postgres vs MySQL: the impact of CPU overhead on performance