Posts

Showing posts from August, 2015

Cached linkbench performance for MySQL 5.7.8, 5.6, WebScale and MyRocks

Image
This extends previous results for Linkbench to compare performance for a cached database with concurrent clients. My conclusions are: InnoDB compression in the Facebook patch for MySQL 5.6 is much faster for insert-heavy workloads than the same feature in upstream 5.6 and 5.7. Too bad those changes might not reach upstream .  InnoDB transparent page compression is faster than non-transparent for write-heavy workloads assuming that feature is OK to use on your servers. QPS for MyRocks suffers over time. We have work-in-progress to fix this. Otherwise it is already competitive with InnoDB. Compression with MyRocks is much better than InnoDB for linkbench data. That has also been true for real workloads. Load rates are higher for compressed InnoDB tables when partitioning is used for 5.6 but not for 5.7. I didn't debug the slowdown in 5.7. It has been a win in the past for IO-bound linkbench because it reduces contention on the per-index mutex in InnoDB. Work has been done in

First day with InnoDB transparent page compression

I ran linkbench overnight for a database that started at 100G using MySQL 5.7.8 and InnoDB transparent page compression. After ~24 hours I have 1 mysqld crash with nothing in the error log. I don't know if that is related to bug 77738 . I will attach gdb and hope for another crash. For more about transparent page compression read here , here and here . For concerns about the feature see the post by Domas . I previously wrote about this feature. On the bright side, this is a great opportunity for MyRocks, the RocksDB storage engine for RocksDB. Then I ran 'dmesg -e' and get 81 complaints from XFS on the host that uses transparent compression. The warnings are from the time when the benchmark ran. My other test host isn't using hole-punch and doesn't get these warnings. I get the same error message below from a host with CentoOS 6.6 host using a 3.10.53 kernel. [Aug27 05:53] XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) [  +1.999375] XFS

Single-threaded linkbench performance for MySQL 5.7, 5.6, WebScale and MyRocks

Image
The good news is that features in the pending 5.7 release look great. More good news is that InnoDB transparent page compression might be faster than the original compression feature assuming your storage system supports it. The bad news is that there are significant performance regressions for low-concurrency workloads. I previously reported this for 5.6 and 5.7.5  and have yet to see progress. While the focus on high-concurrency workloads has been important, we can't lose this much performance at low-concurrency. I used  linkbench  with one client thread to load and then query a small & cached database. This was repeated for many configurations to determine the impact from compression and partitioning. The binlog was enabled but it and the InnoDB redo log were not synced on commit. The performance summary is: InnoDB transparent compression is faster for loads than non-transparent compression Insert rates for 5.7.8 are much worse than for 5.6. The insert rate for 5.6 ar

How to use iTunes family sharing

It took me too long to figure this out. If you have family sharing enabled for iTunes and try to play content from the library of another member in the plan then you will get a dialog asking for your password followed by another dialog claiming the computer is already authorized and the content won't play. You have to download the content, you can't stream it, but that isn't the most annoying part. The shared content is hard to find. iTunes seems to change the UI each in random ways each year. This advice works for version 12.2.2. click iTunes store at the top click purchased  (small text buried on middle of right hand side) Use pull down on left hand side to switch from self to other person next to Purchased Click on a show on the left hand side Click on Episodes  at the top to switch from Seasons and see the episodes.

Transactions with RocksDB

RocksDB has work-in-progress to support transactions via optimistic and pessimistic concurrency control . The features need more documentation but we have shared the API , additional code for pessimistic and optimistic and examples for pessimistic and optimistic . Concurrency control is a complex topic (see these posts ) and is  becoming popular again for academic research. An awesome PhD thesis on serializable snapshot isolation by Michael Cahill ended up leading to an implementation in PostgreSQL . We intend to use the pessimistic CC code for MyRocks, the RocksDB storage engine for MySQL. We had many discussions about the repeatable read semantics in InnoDB and PostgreSQL and decided on Postgres-style. That is my preference because the gap locking required by InnoDB is more complex. MongoRocks uses a simpler implementation of optimistic CC today and a brief discussion on CC semantics for MongoDB is here . AFAIK, write-write conflicts can be raised, but many are caught and

Reducing write amplification in MongoRocks

After evaluating results for MongoRocks and TokuMX I noticed that the disk write rate per insert was ~3X larger for MongoRocks than TokuMX. I ran tests with different configurations to determine whether I could reduce the write rate for MongoRocks during the insert benchmark tests  on a server with slow storage. I was able to get a minor gain in throughput and a large reduction in write amplification by using Snappy compression on all levels of the LSM. Configuration The test was to load 500M documents with a 256 byte pad column using 10 threads. There were 3 secondary indexes to be maintained. In the default configuration RocksDB uses leveled compaction with no compression for L0 and L1 and Snappy compression for L2 and beyond. I used QuickLZ for TokuMX 2.0.1 and assume that all database file writes were compressed by it. From other testing I learned that I need to increase the default value used for bytes_per_sync with MongoRocks to trigger calls to sync_file_range less frequentl

Insert benchmark and queries on slow storage for MongoDB

This continues the use of the insert benchmark to understand performance for different MongoDB engines with slow storage. I previously explained  load  and  query  performance for a server with PCIe flash and documented the impact of  too frequent invalidation  in the plan cache. In this post I explain load and query performance for a server with a disk array -- SW RAID 0 across 15 SATA disks. My summary from this and the previous posts includes: write-optimized engines (TokuMX, RocksDB)  do much better  than b-trees (WiredTiger, mmapv1) for secondary index maintenance on large databases because they don't do read-before-write. InnoDB might be unique among the b-tree implementations -- it has an optimization to save on disk reads  via the insert buffer . MongoDB invalidates cached plans too quickly, but that has been fixed recently. The performance impact from this can be much worse on slow storage. TokuMX 2.0.1 has a CPU bottleneck from the partial eviction code that also hu

Different kinds of copy-on-write for a b-tree: CoW-R, CoW-S

I use CoW-R and CoW-S to distinguish between two approaches to doing copy-on-write for a B-Tree. CoW-R stands for copy-on-write random and is used by LMDB and WiredTiger . Page writeback is not in place. The writes use the space from dead versions of previously written pages. Background threads are not used to copy live data from old log segments or extents. This tends to look like random IO. When the number of pages written per fsync is large then a clever IO scheduler can significantly reduce the random IO penalty for disks and LMDB might benefit from that. WiredTiger has shown that compression is possible with CoW-R. But compression makes space management harder. CoW-S stands for copy-on-write sequential. New writes are done to one or a few active log segments. Background threads are required to copy-out live data from previously written log segments. Compared to CoW-R this has less random IO at the cost of extra write-amplification from cleaning old segments. It allows for a tr

Insert benchmark & queries for MongoDB on fast storage

Image
This extends the performance evaluation using the insert benchmark on a server with fast storage. After the load extra tests were run using different numbers of insert and query threads. The load was done for 2B documents with a 256 byte pad field. The databases are at least 500G and the test server has 144G of RAM and 40 hyperthread cores. My conclusions are: WiredTiger has the best range query performance followed closely by RocksDB. While the LSM used by RocksDB can suffer a penalty on range reads, that wasn't visible for this workload. mmapv1 did OK on range query performance when there were no concurrent inserts. Adding concurrent inserts reduces range query QPS by 4X. Holding exclusive locks while doing disk reads for index maintenance is a problem, see server-13225 . TokuMX 2.0.1 has a CPU bottleneck that hurts range query performance. I await advice from TokuMX experts to determine whether tuning can fix this. I have been using a configuration provided by experts.