Friday, July 17, 2020

Review of -- TimescaleDB: SQL made scalable for time-series data

This is a short review of TimescaleDB: SQL made scalable for time-series data. My focus is on indexing and I ignore many interesting things described in that paper. My goal in reading it was to understand possible optimizations for workloads like the insert benchmark v3. I also read some of the online TimescaleDB docs. This blog post is truthy as I am not an expert on TimescaleDB. Hopefully I haven't made too many mistakes. The paper is several years old and TimescaleDB has improved a lot since then.

The paper does a great job asserting what they know to be true about the workloads for which TimescaleDB is targeted.

Assertion 1:
Time-series data is largely immutable. New data continually arrives, typically corresponding to the latest time periods. In other words, writes primarily occur as new inserts, not as updates to existing rows. Further, while the database needs to be able to support backfill for delayed data, writes are made primarily to recent time intervals.
Assertion 2:
Workloads have a natural partitioning across both time and space. Writes typically are made to the latest time interval(s) and across the “partitioning key” in the space dimension (e.g., data sources, devices, users, etc.). Queries typically ask questions about a specific time series or data source, or across many data sources constrained to some time interval. Yet the queries might not be limited to a particular metric, but may regularly select multiple metrics at once (or use predicates that rely on multiple metrics).
Storage
Rows are stored in hypertables rather than tables and hypertables are composed of chunks. A hypertable must have a time column and can have a partition column. The paper states that the partition column is required for clustered TimescaleDB. But the online docs have more nuance.

Chunks are right-sized (their name for it) to fit in memory. Without a partition column the hypertable is range partitioned on time into chunks. With the partition column the data is distributed by hash on the partition column and then range on the time column. In SQL DBMS partitioning terminology this is composite partitioning with: hash(partition), range(time).

But this isn't traditional partitioning because it is dynamic and automatic. Nobody has to run DDL to add, drop and change partitions. That is one way they add value.
Compression has been added since the paper was published. It is described here. I have yet to read that post but assume that old chunks are compressed while chunks still getting inserts remain in row format.
Secondary indexes are local to a chunk. I assume that means that fanout over (too) many chunks can happen when a query doesn't have a predicate on the partition column. But it also means that if the chunk fits in memory, then secondary index maintenance is not delayed by reading from storage and it is easier to sustain high ingest rates.

Challenges

What is a good indexing strategy for a simple time series workload where new data has four attributes: time, deviceID, metricID, metricValue? I write about this in my post on a replacement for the insert benchmark. Assume that I will use a partition column in addition to the time column. My choices with TimescaleDB are:
  1. partition on deviceID and create local secondary indexes on metricID
  2. partition on metricID and create local secondary indexes on deviceID
I am not sure what I would do to make the min/max query fast as explained in the section on Physical Schema - Read Optimized.

Edits

I quickly read the post on columnar compression and I am impressed they made this work in the context of Postgres. They use TOAST to store compressed columnar chunks off-page so IO is only done for the target columns. In terms of being surprised that this was feasible, it reminds me of the work to support columnar in SQL Server.

No comments:

Post a Comment

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...