Posts

Showing posts from December, 2019

What is the future of off-cpu analysis?

The answer is I don't know although I am really asking about tools I will use in 2020 and others will make different choices. I will have better answers in a few months. People have been telling me there is something better than PMP for many years. Perhaps that claim is finally true but this post from Percona suggests that PMP might have a future. I start with Brendan Gregg when I want to learn about modern performance debugging and he has a page on off-cpu analysis . From there I learn eBPF, perf and bcc are the future and I hope that is true. For now I will summarize my use cases and potential solutions. I have three use cases: Small server (< 10 cores) doing ~100k QPS on benchmarks Many-core server (<= 2 sockets, lots of cores/socket) doing x00k QPS on benchmarks. Servers in production Stalls are more tolerable in the first two cases. Crashes and multi-second stalls in production are rarely acceptable. Although when a production server is extremely unhappy the

Readahead

Q: What is the best readahead size? A: O_DIRECT Perhaps I agree with Dr. Stonebraker . This is my answer which might not be the correct answer. My reasons for O_DIRECT are performance, quality of service (QoS) and manageability and performance might get too much attention. I don't dislike Linux but the VM, buffered IO, readahead and page cache are there for all Linux use cases. They must be general purpose. Complex system software like a DBMS isn't general purpose and can do its own thing when needed. Also, I appreciate that kernel developers have done a lot to make Linux better for a DBMS. One of the perks at FB was easy access to many kernel developers. Most of my web-scale MySQL/InnoDB experience is with O_DIRECT. While InnoDB can use buffered IO we always chose O_DIRECT. Eventually, RocksDB arrived and it only did buffered IO for a few years. Then O_DIRECT support was added and perhaps one day the web-scale MyRocks team will explain what they use. I deal with readahe

Slides for talks I have given on MySQL, MongoDB and RocksDB

It all started for me with the Google patch for MySQL in April 2007. The Register summary of that included a Cringely story  repeated by  Nick Carr that Google might have shared the patch as part of a plan to dominate IT via cloud computing. I thought that was ridiculous. AWS said hold my beer and brought us Aurora. I donated to the Wayback Machine to offset my HW consumption for those links. A list of talks from the RocksDB team is here . This is an incomplete list of slide decks and videos from me: MySQL for High-Scale , MySQL Ignite MyRocks Overview , MyRocks meetup RocksDB: from faster to better , CMU with Coach Pavlo MySQL is Web Scale , Istanbul with Monty Program Scaling InnoDB, O'Reilly MySQL 2008 (no slides/video yet) This is not a web app , O'Reilly MySQL 2009 Running MySQL at Scale , FB Tech Talk 2010 High Throughput MySQL , O'Reilly MySQL 2010 Success with MySQL , MySQL Sunday 2010 MySQL at Facebook , O'Reilly MySQL 2010 NoSQL , Hot Stora

Historical - summary of the Google MySQL effort

This summarizes work we did on MySQL at Google. These posts used to be shared at code.google.com but it was shutdown. After reformatting most of these (it was a fun day for me, but sorry for the spam ) I remembered that somone had already done that in 2015. Thank you. It all started for me with the  Google patch for MySQL  in April 2007. The Register  summary of that  included a  Cringely story  repeated by  Nick Carr  that Google might have shared the patch as part of a plan to dominate IT via cloud computing. I thought that was ridiculous. AWS said hold my beer and brought us Aurora. My reformatted posts: work done by several people to fix the InnoDB rw-lock for SMP other work for InnoDB SMP performance user, table, index monitoring crash safe replication state parser changes for new features semisync replication and design doc SHOW STATUS SHOW INNODB STATUS my.cnf adding roles to mysql make user delayed the Google patch for MySQL 5.0 InnoDB IO performance

Historical - InnoDB IO Performance

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. This is a collection from several posts about InnoDB IO performance Max dirty pages InnoDB provides a my.cnf variable, innodb_max_dirty_pages_pct, to set the maximum percentage of buffer pool pages that should be dirty. It then appears to ignores said variable for IO bound workloads (see this post from DimitriK). It doesn't really ignore the value. The problem is that it does not try hard enough to flush dirty pages even when there is available IO capacity. Specific problems include: one thread uses synchronous IO to write pages to disk. When write latency is significant (because O_DIRECT is used, SATA write cache is disabled, network attached storage is used, ext2 is used) then this thread becomes a bottleneck. This was fixed in the v2 G

Historical - Patch for MySQL 5.0

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. This describes the patch for MySQL 5.0 provided by my team at Google. The early patches from us were difficult for others because they tended to include too many diffs. I didn't have time to better. After I moved to FB Steaphan Greene undertook the heroic effort to do proper diff management and the team has continued to follow his example. Introduction The code has been changed to make MySQL more manageable, available and scaleable. Many problems remain to be solved to improve SMP performance. This is a good start. The v3 patch and all future patches will be published with a BSD license which applies to code we have added and changed. Original MySQL sources has a GPL license. I am not sure if the patches were lost after the  googlecode sh

Historical - Make User Delayed

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. I added support to rate limit DBMS accounts that were too busy. It wasn't successful in production for the obvious reason that it just shifts the convoy from the database to the app server -- the problem still exists. The better solution is to fix the application or improve DBMS capacity but that takes time. This describes SQL commands added to rate limit queries per account and per client IP. Per account rate limiting Per-account query delays use new SQL commands to set a query delay for an account. The delay is the number of milliseconds to sleep before running a SQL statement for the account. These values are transient and all reset to zero delay on server restart. The values are set by the command MAKE USER 'user' DELAYED 100 w

Historical - Adding Roles to MySQL

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. I added support for roles to MySQL circa 2008. They arrived upstream  with MySQL 8 in 2018. I wasn't able to wait. I enjoyed the project more than expected. It wasn't hard in terms of algorithms or performance but I had to avoid mistakes to avoid security bugs and the upstream code was well written. I had a similar experience implementing BINARY_FLOAT and BINARY_DOUBLE at Oracle. There I got to learn about the IEEE754 standard and had to go out of my way to catch all of the corner cases. Plus I enjoyed working with Minghui Yang who did the PL/SQL part of it. MySQL roles and mapped users The access control model in MySQL does not scale for a deployment with thousands of accounts and thousands of tables. The problems are that similar priv

Historical - changes to my.cnf

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. TODO - find the linked pages including: MysqlHttp - we added an http server to mysql for exporting monitoring. This was work by Nick Burrett InnodbAsyncIo - this explains perf improvements we made for InnoDB InnoDbIoTuning - explains more perf improvements we made for InnoDB We added these options: http_enable - start the embedded HTTP demon when ON, see MysqlHttp http_port - port on which HTTP listens, see MysqlHttp innodb_max_merged_io - max number of IO requests merged into one large request by a background IO thread innodb_read_io_threads, innodb_write_io_threads - number of background IO threads for prefetch reads and dirty page writes, see InnodbAsyncIo show_command_compatible_mysql4 - make output from some SHOW commands match tha

Historical - SHOW INNODB STATUS

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. MySQL circa 2008 was hard to monitor so we added many things to SHOW STATUS and SHOW INNODB STATUS along with support for  user, table and index statistics . Most of the changes we made to SHOW INNODB STATUS are not listed here. I am not sure whether I ever described them. The most important changes were: list transactions last in the output in case the output was too long and truncated by InnoDB report average and worst-case IO latencies Introduction We have added more output to SHOW INNODB STATUS, reordered the output so that the list of transactions is printed list and increased the maximum size of the output that may be returned. Background threads: srv_master_thread_loops - counts work done by main background thread spinlock delay

Historical - SHOW STATUS changes

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. MySQL circa 2008 was hard to monitor so we added many things to SHOW STATUS and SHOW INNODB STATUS along with support for user, table and index statistics . I added a counter for failures of calls to gettimeofday. That used to be a thing. We also changed mysqld to catch cross-socket differences in hardware clocks on old AMD motherboards. Fun times. Overview We have added extra values for monitoring. Much of the data from SHOW INNODB STATUS is now available in SHOW STATUS. We have also added rate limiting for both SHOW STATUS and SHOW INNODB STATUS to reduce the overhead from overzealous monitoring tools. This limits how frequently the expensive operations are done for these SHOW commands. Changes General Binlog_events - number of repl

Historical - design doc for semisync replication

Image
This can be read along with the initial semisync post . This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. Semisync replication was designed and implemented by Wei Li . He did a lot of work to make replication better for web-scale and then moved away from MySQL. Upstream reimplemented the feature which was a good decision given the constraints on our implementation time. Introduction Semi-sync replication blocks return from commit on a master until at least one slave acknowledges receipt of all replication events for that transaction. Note that the transaction is committed on the master first. Background MySQL replication is asynchronous. If a master fails after committing a transaction but before a slave copied replication events for that transaction, the transaction might be lost forever.

Historical - SemiSync replication

This post was shared on code.google.com many years ago but code.google has been shutdown. It describes work done by my team at Google. I am interested in the history of technology and with some spare time have been enable to republish it. Semisync was useful to but misunderstood. Lossless semisync was awesome but perhaps arrived too late as Group Replication has a brighter future. I like Lossless semisync because it provides similar durability guarantees to GR without the overhead of running extra instances locally. Not running extra instances locally for GR means that commit will be slow courtesy of the speed of light. I hope that GR adds support for log-only voters (witnesses). Regular semisync was misunderstood because people thought it provided extra durability. It didn't do that. It rate limited busy writers to reduce replication lag. It also limited a connection to at most one transaction that wasn't on at least one slave to reduce the amount of data that can be lost