As much as I like the technology from in-memory DBMS vendors like MemSQL, VoltDB and MySQL Cluster I am amused at some of the marketing that describes capacity planning as a function of QPS. It usually isn't true that when a workload has a peak QPS of X and the peak per-server QPS for the product is Y, then X/Y nodes is sufficient. The Oracle keynote from Mr. Big is a an example of this.
Work in web-scale database land is more complex courtesy of additional constraints including database size, network latency and network bandwidth. Workloads don't require X QPS, they do require X QPS over a database of Y PB with requirements for response time. When Y is large then it helps to not keep all data in RAM. This is made easier when a workload has data access skew so that the hot data can be in RAM while much more data is on disk.
With fast storage there are clever ways to keep indexes in RAM and data on SSD and with proper indexing queries are limited to one or two reads from the storage device. This can provide excellent response times without keeping everything in RAM. The latency from an SSD read might double the query response time compared to an in-memory database which is acceptable for many workloads. Aerospike seems to be good at this but this can also be done with InnoDB (covering indexes and a clustered PK FTW).
Network latency and bandwidth are additional constraints for capacity planning. While they don't determine whether data must be in RAM they might determine how many copies are required for both the database and any caches that accompany the database and by cache I mean something like memcached. With more copies of the data you are likely to have a copy closer to the clients (clients == web tier) and that reduces the network latency added to queries. Even with a database capable of 1B QPS a deployment can require caches to avoid the extra network latency as latency and bandwidth to that fast database server can be a bottleneck.
Subscribe to:
Post Comments (Atom)
RocksDB on a big server: LRU vs hyperclock, v2
This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...
-
This provides additional results for Postgres versions 11 through 16 vs Sysbench on a medium server. My previous post is here . The goal is ...
-
I often use HWE kernels with Ubuntu and currently use Ubuntu 22.04. Until recently that meant I ran Linux 6.2 but after a recent update I am...
-
I am trying out a dedicated server from Hetzner for my performance work. I am trying the ax162-s that has 48 cores (96 vCPU), 128G of RAM a...
Not only that, a common mistake I see is: If your cluster has a maximum QPS per node, and you require availability of, let's say, supporting z node failures, then you need X/Y + z, otherwise your cluster will fail in cascade. Adding z extra nodes adds latency, etc. (aside from everything else you mention- DB thoughput is not everything).
ReplyDeleteNow that InnoDB seems to completely replacing memory and MyISAM, and there are more "complex" (please note the quotes) tablespace management I would like to see some way to pinning InnoDB tables/indexes/partitions into memory without relaying on mounting memory.