Small Datum: Indexing and write-heavy workloads

When I see impressive numbers for the insert rate that a DBMS can sustain I wonder what indexes exist and whether the inserts are in sequential or random order with respect to each index. One way to explain this is in terms of the numbers of points in the index at which the inserts occur. Although I use streams rather than insert points in what follows.

I am writing this in part so that I can reference this post in future performance reports when describing workloads. It isn't sufficient to state that inserts are in PK order. They can be in ascending or descending PK order. When ascending the point at which the inserts are done can be at the right end of the index (inserted keys > than existing keys) or somewhere in the middle of the index. When descending the inserts can be done at the left end of the index (inserted keys < existing keys) or somewhere in the middle of the index.

Explaining insert patterns

There are four attributes per index that can explain such insert patterns. The attributes are:

nAsc - number of streams for which inserts occur in ascending order WRT the index
nDesc - number of streams for which inserts occur in descending order WRT the index
nLHS - the number of descending streams that are at the left end of the index
nRHS - the number of ascending streams that are at the right end of the index

Constraints:

nAsc >= 0, nDesc >= 0 and (nAsc + nDesc) >= 1
nLHS and nRHS must be 0 or 1
if nLHS is 1 then nDesc must be >= 1
if nRHS is 1 then nAsc must be >= 1

There is one exception. When the insert pattern is random WRT the index then inf is used instead of the four attributes.

Geek Code

This is a geek code for explaining insert patterns. The attributes are specified per index. When there is only a PK index, named pk, and inserts occur in PK order at the right end of the index (right growing) then the geek code is:

pk=(nAsc:1, nDesc:0, nLHS:0, nRHS:1)

When there is only a PK index but the inserts are in random order WRT the PK then the geek code is:

pk=(inf)

To improve readability I omit attributes for which the value is 0. So these mean the same thing:

pk=(nAsc:1, nDesc:0, nLHS:0, nRHS:1)
pk=(nAsc:1, nRHS:1)

Why?

I am interested in this for three reasons. First, index maintenance has a big impact on insert performance whether or not the working set is in memory. Second, there are optimizations that a DBMS can do for some insert patterns and I suspect there is room for even more optimizations. Many storage engines optimize for right-growing inserts. In that case RocksDB with leveled compaction will have write amplification of 2 -- write once for the WAL, write again for the memtable flush, no compaction. Finally, this makes it easier to explain write-heavy workloads.

Steams and insert points

I use ordered arrays rather than indexes to explain streams (insert points). Assume the array starts as: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], this represents the keys in the PK index and there are no secondary indexes. Some of the examples showwhy I use streams to describe this.

Examples:

Random

pk=(inf)
insert sequence: 1.5, 6.5, 1.7, 8.1, 0.0, 4.5, ...

Right growing

pk=(nAsc:1, nRHS:1)
insert sequence: 10.0, 11.0, 12.0, ...

Left growing

pk=(nDesc:1, nLHS:1)
insert sequence: 0.0, -1.0, -2.0, ...

Left & right growing

pk=(nAsc:1, nDesc:1, nLHS:1, nRHS:1)
insert sequence: 10.0, 0.0, -1.0, 11.0, 12.0, -2.0
insert sequence as interleaved streams: [10.0, 11.0, 12.0] and [0.0, -1.0, -2.0]

1 middle ascending

pk=(nAsc:1, nRHS:0)
insert sequence: 8.1, 8.11, 8.111, 8.1111, ...

1 middle descending

pk=(nDesc:1, nLHS:1)
insert sequence: 7.9, 7.89, 7.889, 7.8889, ...

1 middle ascending, 1 middle descending

pk=(nAsc:1, nDesc:1)
insert sequence: 8.1, 7.9, 8.11, 7.89, 8.111, 7.889, ...
insert sequence as interleaved streams: [8.1, 8.11, 8.111] and [7.9, 7.89, 7.889]

2 middle ascending:

pk=(nAsc:2)
insert sequence: 8.1, 6.1, 8.11, 6.11, 8.111, 6.111, ...
insert sequence as interleaved streams: [8.1, 8.11, 8.111] and [6.1, 6.11, 6.111]

N middle ascending

pk=(nAsc:N) for some finite value N

Explaining the insert benchmark

Until recently I ran the insert benchmark by first creating a PK index and 3 secondary indexes per table (or collection) and then doing inserts. Informally, the inserts were in PK order but random WRT to each secondary index. More formally, the insert pattern is the following when the secondary indexes are named s1, s2 and s3:

pk=(nAsc:1, nRHS:1)
s1=(inf)
s2=(inf)
s3=(inf)

The insert benchmark can become extremely IO-bound because of the random insert patterns for each of the secondary indexes. In the worst case with a B-Tree there is one page read and one page written back per secondary index per insert (3 pages read, 3 pages written back with 3 secondary indexes).

I recently changed the way that I run the insert benchmark to create the PK index, load some data, create secondary indexes and then load more data. In this case the insert pattern during load some data is:

pk=(nAsc:1, nRHS:1)

And then during load more data (with secondary indexes in place) is:

pk=(nAsc:1, nRHS:1)
s1=(inf)
s2=(inf)
s3=(inf)

Small Datum

Wednesday, July 15, 2020

Indexing and write-heavy workloads

No comments:

Post a Comment

Is it time for TPC-BLOB?