Posts

Showing posts from March, 2020

Compiling from source for Amazon Linux 2

Different is worse I used to know struggle with  make & automake but I knew enough to get things done. Now I have fun with make, cmake, maven and gradle. I am sure this proliferation solves problems for people who create the build scripts that I use. But it isn't a good experience for the intermittent user -- different is worse because my skill level per tool is lower. What is Amazon Linux 2? It has a landing page . I call it AL2 below. It claims to be like CentOS, RHEL, Fedora and it uses yum. I know how to use Google to get answers for those distros when trying to figure out how to build something from source. But it isn't clear to me that packages that exist for CentOS also exist for AL2. One example of the challenge is that I ended up installing jemalloc from source because I couldn't find the package without advice from an expert. I hope the output from cat /etc/os-release on AL2 is improved. Below are examples for AL2 and Ubuntu 18.04. Ubuntu specific tha

kernel lockdown vs blktrace

I am trying to use blktrace to determine which files are the source of disk reads for a database that uses buffered IO. The server runs Ubuntu 18.04 on bare metal and the boot uses UEFI not legacy. blktrace doesn't work , time to debug. Things that don't fix it include upgrading from 4.15 to 5.3 HWE kernel and disabling apparmor. Next up is disabling kernel lockdown via mokutil --disable-validation . Alas, blktrace still fails at startup. After running mokutil and then rebooting there are still a few messages in dmesg output about lockdown so I wonder whether it was fully disabled. Lockdown: Hibernation is restricted; see man kernel_lockdown.7 Lockdown: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7 OK, lets read the man page. Great, it doesn't exist  -- not for Ubuntu nor for other distros. There is a draft but I am starting to get the impression that lockdown wasn't ready for prime time. And Linus had a strong opinion about it in 2018. Next up is a s

Insert Benchmark v3

I expect to replace the insert benchmark later this year. The insert benchmark I have been using might be ibench v2 as ibench v1 came from Tokutek. So the replacement I write would be ibench v3. The reasons for a replacement include: Switch from Python to something more performant (Java or Go) as it is difficult to sustain high QPS rates (100k / second or more) with Python and a Java/Go client will consume less CPU. I assume the DBMS I care about all have good golang clients (MongoDB, MySQL, Postgres). Make the benchmark less synthetic. Early ibench was insert-only for a table with a PK and 3 secondary indexes. Eventually I added support for short range queries. It has always used uniform distribution. The plan for ibench v3 is to model a monitoring workload -- many inserts, some updates, many queries. It will use a PK and 1 or 2 secondary indexes. Continue to support multiple DBMS -- MongoDB, MySQL and Postgres. As a bonus, don't ignore coordinated omission I still need

Tuning space and write amplification to minimize cost

Image
This uses math to show how to tune space and write amplification to minimize storage costs for an index structure that uses index+log. The result, minimal cost, is true assuming my model is true. But at this point I will only claim that the model is truthy. I look forward to more results in this area from the smart people at DASlab and elsewhere. I wonder if database economics is a good name for this topic. I explained the index+log index structure here and here  and my truthy model assumes that write and space amplification are functions of PctUsed - the percent of available storage that is used by the index structure. The model is: Space amplification = 100 / PctUsed Write amplification = 100 / (100 - PctUsed) In what follows I use SpaceAmp and WriteAmp for space and write amplification. When PctUsed is X then the remaining space on the storage device (100-X) is free, not used by anything. The formulas mean that a deployment can trade between SpaceAmp and WriteAmp by adjus

Review of Benchmarking RocksDB Key-Value Workloads at Facebook

At some level all database engines have a key-value API even if they aren't exposed to the user. One name for this in a SQL DBMS is the data layer. The data layer is RocksDB for MyRocks & MongoRocks and then WiredTiger for MongoDB. FAST 20 has a paper on characterizing RocksDB key-value workloads at Facebook. One of the workloads is UDB and UDB uses MyRocks, a SQL DBMS. Benchmarks are useful to me but there is always room for improvement. Two common problems are the lack of load variance and the lack of complexity in access patterns. By the lack of load variance I mean that the workload doesn't change over time which fails to capture the daily spikes that occur for web-scale workloads as different parts of the world wake and sleep. They also fail to include daily and intermittent operational tasks like backup, schema change, data migration and archiving old data. This paper explains the complexity in access patterns for three RocksDB use cases and shows how to rep

Building MySQL 5.6 from source on Ubuntu 18.04

This explains the steps I used to build MySQL 5.6.35 from source on Ubuntu 18.04. I build with the perf schema disabled and if you do that then don't use Connector/J 5.1.48 (5.1.47 is OK) or you won't be able to connect thanks to bug 98919 . The new Connector/J dependency on perf schema variables can be a problem for a DBMS that implements the MySQL API -- see this bug report for MemSQL. Install an older version of gcc and g++: sudo apt install gcc-5 g++-5 Unpack source. I used 5.6.35 Remove the connection_control plugin because that does not compile when the perf schema is disabled at compile time -> rm -rf $SRC_ROOT/plugin/connection_control Run cmake. See below. Note that I disable the perf schema. My Cmake script: prefix=$1 CC=gcc-5 CXX=g++-5 \ cmake .. \       -DBUILD_CONFIG=mysql_release \       -DCMAKE_BUILD_TYPE=RelWithDebInfo \       -DCMAKE_INSTALL_PREFIX:PATH=$prefix \       -DWITH_SSL="bundled" \       -DWITH_ZLIB="bundle

RDBMS != SQL DBMS

We use RDBMS as another name for SQL DBMS but SQL isn't relational. That isn't news, see this web site and book . SQL allows for but doesn't require relational and 1NF or 3NF are optional.  JSON is in the SQL:2106 spec. What would Codd think? Using Oracle as a SQL DBMS example. First there was support for collection data types , then there was XML and eventually JSON arrived . These let you violate 1NF. I won't argue whether these should be used. I only claim they can be used. Have there been surveys to document how often the relational approach is used with a SQL DBMS? I assume it is better to think of a distribution of approaches (a value between 0 and 1 where 0 is SQL and 1 is relational) rather than a binary approach of relational vs SQL (not relational). I might call the SQL endpoint the pragmatic approach, but that introduces bias. While I have spent a long time working on SQL DBMS I am usually working under the hood and don't design applications.