Sunday, January 12, 2025

Evaluating vector indexes in MariaDB and pgvector: part 1

This post has results for vector index support in MariaDB and Postgres. I am new to vector indexes so I will start small and over time add more results.  This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.

I compare MariaDB with pgvector because I respect the work that the Postgres community has done to support vector search workloads. And I am happy to report that MariaDB has also done a great job on this. While I don't know the full story of the development effort, this feature came from the MariaDB Foundation and the community and it is wonderful to see that collaboration.

tl;dr

  • Performance for MariaDB is excellent
    • peak QPS for a given recall target is much better than pgvector
    • time to create the index to achieve a given recall target is much better than pgvector
  • MariaDB is easier to tune than pgvector
  • MariaDB needs more documentation
The good and the less good

The good for MariaDB starts with performance. The results I get are great and match the results in this blog post from upstream. Another good thing about MariaDB is that it is easier to tune. With pgvector I need to set M and ef_construction while creating an index and then ef_search while running a query, With MariaDB there is no option to set ef_construction. And evaluations are easier when there are fewer options to tune. Note that the M with pgvector isn't the same thing as the M with MariaDB, but they are similar (at least in spirit).

The less good for MariaDB is the documentation. We need more, but that is easy to fix. I have been using:

Things that need more documentation
  • the algorithm is described as a variation of HNSW (MHNSW) and the community will benefit from more details on what has been changed. For example, I can't set ef_construction and it always uses float16 in the index. But with pgvector I can use either float32 (default) or float16 (via halfvec).
  • What transaction isolation level is supported? I get that workloads will be read-heavy but I assume that some won't be read-only so I won't to know whether repeatable read and read committed are provided.
  • What concurrent operations are supported? Obviously, reads can be concurrent with other reads. But can writes be concurrent with reads or with other writes on the same table?
Hardware

The hardware is a Beelink SER7 7840HS with a Ryzen 7 7840HS CPU, 32G of RAM and Ubuntu 22.04.

While I already installed some dependencies on this server long ago, to run this benchmark I did:

sudo apt install libmariadb3 libmariadb-dev

pip3 install mariadb

pip3 install pgvector psycopg


Database software

For Postgres I compiled version 17.2 and pgvector 0.8.0 from source. Files that I used include:
For MariaDB I compiled version 11.7.1 from source. Files that I used include:
Benchmark

I am using the ann-benchmarks via my fork of a fork of a fork at this commit. These forks have changes to run the benchmark for MariaDB and pgvector without using Docker containers. A request I have for anyone writing a benchmark client is to limit dependencies, or at least make them optional. I just want to point the benchmark client at an existing installation.

The ann-benchmarks configuration files are here for MariaDB and for pgvector. I am open to feedback that I should try different parameters. I added support to use float16 (halfvec) for the pgvector index (but only the index, not for the vector stored in the table).

In this post I use the fashion-mnist-784-euclidean dataset in non-batch mode where non-batch mode doesn't run concurrent queries and batch mode does. Note that I don't set --count when running the benchmark which means that all of the queries use LIMIT 10.

The command lines for non-batch mode are:
POSTGRES_CONN_ARGS=root:pw:127.0.0.1:5432 POSTGRES_DB_NAME=ib \
    python3 -u run.py  --algorithm pgvector --dataset fashion-mnist-784-euclidean --local

POSTGRES_CONN_ARGS=root:pw:127.0.0.1:5432 POSTGRES_DB_NAME=ib \
    python3 -u run.py  --algorithm pgvector_halfvec --dataset fashion-mnist-784-euclidean --local

MARIADB_CONN_ARGS=root:pw:127.0.0.1:3306 MARIADB_DB_NAME=test \
    python3 -u run.py  --algorithm mariadb --dataset fashion-mnist-784-euclidean --local

Results: QPS vs recall graphs

The recall vs QPS graph is created by: python3 plot.py --dataset fashion-mnist-784-euclidean

The results below for MariaDB are excellent. It gets more QPS than pgvector at a given recall target.


Results: create index

I am still trying to figure out how to present this data. All of the numbers are here for the time to create an index and the size of the index. The summary is:
  • Indexes have a similar size with MariaDB and Postgres with halfvec. The Postgres index without halfvec is about 2X larger.
  • Time to create the index for Postgres is similar with and without halfvec
  • Time to create the index for MariaDB is less than for pgvector. Perhaps the best way to compare this is the time to create the index for a similar point on the QPS/recall graph (see the last section of this blog post)
Results: best QPS for a given recall

Many benchmark results are marketed via peak performance (max throughput or min response time) but these are usually constrained optimization problems -- determine peak performance that satisfies some SLA. And the SLA might be response time or efficiency (cost).

With ann-benchmarks the constraint might be recall where you seek the best QPS that satisfies a recall target. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms (MariaDB, pgvector with float32, pgvector with float16).

For all cases below except the first (recall = 1.000) the time to create the index is about 2X or more longer with pgvector than with MariaDB.

For all cases below the best QPS at a given recall target is much better, sometimes 2X better, with MariaDB than with pgvector.

Legend
* recall & QPS - results from the benchmark
* isecs - number of seconds to create the index for M and ef_construction (ef_cons)

Best QPS with recall >= 1.000
recall  QPS  isecs  algorithm
1.000   920  97.4   PGVector(m=16, ef_cons=192, ef_search=120)
1.000   991  89.8   PGVector_halfvec(m=16, ef_cons=256, ef_search=120)
1.000  3226 111.8   MariaDB(m=32, ef_search=10)

Best QPS with recall >= 0.99
recall  QPS  isecs  algorithm
0.990  2337  70.7   PGVector(m=16, ef_cons=96, ef_search=20)
0.991  2558  78.3   PGVector_halfvec(m=16, ef_cons=192, ef_search=20)
0.995  4745  27.4   MariaDB(m=12, ef_search=10)

Best QPS with recall >= 0.98
recall  QPS  isecs  algorithm
0.985  2383  50.4   PGVector(m=16, ef_cons=32, ef_search=20)
0.988  2608  52.8   PGVector_halfvec(m=16, ef_cons=64, ef_search=20)
0.984  5351  18.1   MariaDB(m=8, ef_search=10)

Best QPS with recall >= 0.97
recall  QPS  isecs  algorithm
0.972  3033  42.3   PGVector(m=8, ef_cons=96, ef_search=20)
0.973  3185  89.8   PGVector_halfvec(m=16, ef_cons=256, ef_search=10)
0.984  5351  18.1   MariaDB(m=8, ef_search=10)

Best QPS with recall >= 0.96
recall  QPS  isecs  algorithm
0.961  3044   50.4    PGVector(m=16, ef_cons=32, ef_search=10)
0.967  3250   52.8    PGVector_halfvec(m=16, ef_cons=64, ef_search=10)
0.961  5691   14.8    MariaDB(m=6, ef_search=10)

Best QPS with recall >= 0.95
recall  QPS  isecs  algorithm
0.961  3044  50.4   PGVector(m=16, ef_cons=32, ef_search=10)
0.967  3250  52.8   PGVector_halfvec(m=16, ef_cons=64, ef_search=10)
0.961  5691  14.8   MariaDB(m=6, ef_search=10)

Thursday, January 9, 2025

Sysbench performance over time for InnoDB and MyRocks: part 4

This is part 4 in my (possibly) final series on performance regressions in MySQL using cached sysbench as the workload. For previous posts, see part 1part 2 and part 3. This post covers performance differences between InnoDB in upstream MySQL 8.0.32, InnoDB in FB MySQL 8.0.32 and MyRocks in FB MySQL 8.0.32 using a server with 32 cores and 128G of RAM.

I don't claim that the MyRocks CPU overhead isn't relevant, but this workload (CPU-bound, database is cached) is a worst-case for it.

tl;dr 

  • InnoDB from FB MySQL is no worse than ~10% slower than InnoDB from upstream
  • Fixing bug 1506 is important for InnoDB in FB MySQL
  • MyRocks is ~30% slower than upstream InnoDB at low concurrency and ~45% slower at high, as it uses ~1.5X more CPU/query
  • For writes, MyRocks does worse at high concurrency than at low
Updates: For writes, MyRocks does worse at high concurrency than at low

I looked at vmstat metrics for the update-nonindex benchmark and the number of context switches per update is about 1.2X larger for MyRocks vs InnoDB at high concurrency. 

Then I looked at PMP stacks and MyRocks has more samples for commit processing. The top stacks are here. This should not be a big surprise because MyRocks does more work at commit time (pushes changes from a per-session buffer into the memtable). But I need to look at this more closely.

I browsed the code in Commit_stage_manager::enroll_for, which is on the call stack for the mutext contention, and it is kind of complicated. I am trying to figure out how many mutexes are locked in there and figuring that out will take some time. 

Benchmark, Hardware

Much more detail on the benchmark and hardware is here. I am trying to avoid repeating that information in the posts that follow. 

Results here are from the c32r128 server with 32 CPU cores and 128G of RAM. The benchmarks were repeated for 1 and 24 threads. On the charts below that is indicated by NT=1 and NT=24.

Builds

The previous post has more detail on the builds, my.cnf files and bug fixes.

The encoded names for these builds is:
  • my8032_rel_o2nofp
    • InnoDB from upstream MySQL 8.0.32
  • fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
    • FB MySQL 8.0.32 at git hash ba9709c9 (as of 2024/10/23) using RocksDB 9.7.1. This supports InnoDB and MyRocks.
  • fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506
    • FB MySQL 8.0.32 at git hash ba9709c9 (as of 2024/10/23) using RocksDB 9.7.1 with patches applied for bugs 1473, 1481, 1482 and 1506, This supports InnoDB and MyRocks.
The my.cnf files are:
Relative QPS

The charts and summary statistics that follow use a number that I call the relative QPS (rQPS) where:
  • rQPS is: (QPS for my version) / (QPS for base version)
  • base version is InnoDB from upstream MySQL 8.0.32 (my8032_rel_o2nofp)
  • my version is one of the other versions
Results

The microbenchmarks are split into three groups: point queries, range queries, writes. The tables below have summary statistics for InnoDB and MyRocks using the relative QPS (the same data as the charts).

Results are provided in two formats: charts and summary statistics. The summary statistics table have the min, max, average and median relative QPS per group (group = point, range and writes).

The spreadsheets and charts are also here. I don't know how to prevent the microbenchmark names on the x-axis from getting truncated in the png files I use here but they are easier to read on the spreadsheet.

The charts use NT=1, NT=16 and NT=24 to indicate whether sysbench was run with 1, 16 or 24 threads. The charts and table use the following abbreviations for the DBMS versions:
  • fbinno-nofix
    • InnoDB from fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
  • fbinno-somefix
    • InnoDB from fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506
  • myrocks-nofix
    • MyRocks from fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
  • myrocks-somefix
    • MyRocks from fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506

Summary statistics: InnoDB

Summary:

  • InnoDB from FB MySQL is no worse than ~10% slower than InnoDB from upstream
  • Fixing bug 1506 is important for InnoDB in FB MySQL
1 thread

fbinno-nofixminmaxaveragemedian
point0.890.960.920.91
range0.630.930.820.82
writes0.860.980.890.88
fbinno-somefixminmaxaveragemedian
point0.921.000.960.95
range0.890.960.910.91
writes0.890.990.920.92

24 threads

fbinno-nofixminmaxaveragemedian
point0.920.960.940.94
range0.620.960.810.82
writes0.840.940.880.87
fbinno-somefixminmaxaveragemedian
point0.940.990.970.98
range0.780.990.890.91
writes0.860.950.900.88

Summary statistics: MyRocks

Summary:

  • MyRocks does better at low concurrency than at high. The fix might be as simple as enabling the hyper clock block cache
  • MyRocks is ~30% slower than upstream InnoDB at low concurrency and ~45% slower at high
  • For writes, MyRocks does worse at high concurrency than at low
1 thread

myrocks-nofixminmaxaveragemedian
point0.520.750.660.68
range0.370.720.600.60
writes0.651.210.790.73
myrocks-somefixminmaxaveragemedian
point0.510.790.680.70
range0.430.760.620.61
writes0.661.230.800.74

24 threads

myrocks-nofixminmaxaveragemedian
point0.400.760.490.43
range0.400.710.580.60
writes0.441.370.650.55
myrocks-somefixminmaxaveragemedian
point0.480.770.550.51
range0.430.710.600.60
writes0.451.390.660.55

Results: c32r128 with InnoDB and point queries

Summary
  • InnoDB from FB MySQL is no worse than 10% slower than upstream

Results: c32r128 with MyRocks and point queries

Summary
  • at low concurrency the worst case for MyRocks are the tests that do point lookup on secondary indexes because that uses a range scan rather than a point lookup on the LSM tree, which means that bloom filters cannot be used
  • at high concurrency the difference between primary and secondary index queries is less significant, perhaps this is dominated by mutex contention from the LRU block cache and solved by using hyper clock

Results: c32r128 with InnoDB and range queries

Summary

  • the worst case for InnoDB from FB MySQL are the long range scans and fixing bug 1506 will be a big deal

Results: c32r128 with MyRocks and range queries

Summary

  • while long range scans are the worst case here, bug 1506 is not an issue as that is InnoDB-only

Results: c32r128 with InnoDB and writes

Summary

  • results are stable here, InnoDB from FB MySQL is no worse than ~10% slower than upstream but results at high concurrency are a bit worse than at low

Results: c32r128 with MyRocks and writes

Summary

  • while MyRocks does much better than InnoDB for update-index because it does blind writes rather than RMW for non-unique secondary index maintenance
  • MyRocks does worse at high concurrency than at low




Sysbench performance over time for InnoDB and MyRocks: part 3

This is part 3 in my (possibly) final series on performance regressions in MySQL using cached sysbench as the workload. For previous posts, see part 1 and part 2. This post covers performance differences between InnoDB in upstream MySQL 8.0.32, InnoDB in FB MySQL 8.0.32 and MyRocks in FB MySQL 8.0.32 using a server with 24 cores and 64G of RAM.

I don't claim that the MyRocks CPU overhead isn't relevant, but this workload (CPU-bound, database is cached) is a worst-case for it.

tl;dr 

  • InnoDB from FB MySQL is no worse than ~10% slower than InnoDB from upstream
  • MyRocks is ~35% slower than InnoDB from upstream as it uses ~1.5X more CPU/query
  • Fixing bug 1506 is important for InnoDB in FB MySQL
  • For writes, MyRocks does worse at high concurrency than at low
  • while MyRocks does much better than InnoDB for update-index at 1 thread, that benefit goes away at 16 threads
Benchmark, Hardware

Much more detail on the benchmark and hardware is here. I am trying to avoid repeating that information in the posts that follow. 

Results here are from the c24r64 server with 24 CPU cores and 64G of RAM. The benchmarks were repeated for 1 and 16 threads. On the charts below that is indicated by NT=1 and NT=16.

Builds

The previous post has more detail on the builds, my.cnf files and bug fixes.

The encoded names for these builds is:
  • my8032_rel_o2nofp
    • InnoDB from upstream MySQL 8.0.32
  • fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
    • FB MySQL 8.0.32 at git hash ba9709c9 (as of 2024/10/23) using RocksDB 9.7.1. This supports InnoDB and MyRocks.
  • fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506
    • FB MySQL 8.0.32 at git hash ba9709c9 (as of 2024/10/23) using RocksDB 9.7.1 with patches applied for bugs 1473, 1481, 1482 and 1506, This supports InnoDB and MyRocks.
The my.cnf files are:
Relative QPS

The charts and summary statistics that follow use a number that I call the relative QPS (rQPS) where:
  • rQPS is: (QPS for my version) / (QPS for base version)
  • base version is InnoDB from upstream MySQL 8.0.32 (my8032_rel_o2nofp)
  • my version is one of the other versions
Results

The microbenchmarks are split into three groups: point queries, range queries, writes. The tables below have summary statistics for InnoDB and MyRocks using the relative QPS (the same data as the charts).

Results are provided in two formats: charts and summary statistics. The summary statistics table have the min, max, average and median relative QPS per group (group = point, range and writes).

The spreadsheets and charts are also here. I don't know how to prevent the microbenchmark names on the x-axis from getting truncated in the png files I use here but they are easier to read on the spreadsheet.

The charts use NT=1, NT=16 and NT=24 to indicate whether sysbench was run with 1, 16 or 24 threads. The charts and table use the following abbreviations for the DBMS versions:
  • fbinno-nofix
    • InnoDB from fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
  • fbinno-somefix
    • InnoDB from fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506
  • myrocks-nofix
    • MyRocks from fbmy8032_rel_o2nofp_end_241023_ba9709c9_971
  • myrocks-somefix
    • MyRocks from fbmy8032_rel_o2nofp_241023_ba9709c9_971_bug1473_1481_1482_1506

Summary statistics: InnoDB

Summary:

  • InnoDB from FB MySQL is no worse than 9% slower than InnoDB from upstream
  • Fixing bug 1506 is important for InnoDB in FB MySQL

1 thread

fbinno-nofixminmaxaveragemedian
point0.881.010.940.95
range0.680.970.830.83
writes0.860.950.900.89
fbinno-somefixminmaxaveragemedian
point0.941.050.970.96
range0.881.030.920.91
writes0.880.960.920.93

16 threads

fbinno-nofixminmaxaveragemedian
point0.930.960.940.94
range0.650.950.830.85
writes0.880.940.910.91
fbinno-somefixminmaxaveragemedian
point0.940.970.950.95
range0.850.960.910.91
writes0.890.960.920.91

Summary statistics: MyRocks

Summary

  • MyRocks does better at low concurrency than at high. The fix might be as simple as enabling the hyper clock block cache
  • MyRocks is ~35% slower than upstream InnoDB
  • For writes, MyRocks does worse at high concurrency than at low

1 thread

myrocks-nofixminmaxaveragemedian
point0.460.780.670.70
range0.480.730.630.64
writes0.651.490.810.73
myrocks-somefixminmaxaveragemedian
point0.460.780.660.69
range0.510.730.650.64
writes0.661.540.820.74

16 threads

myrocks-nofixminmaxaveragemedian
point0.520.770.630.63
range0.460.730.630.61
writes0.511.010.670.61
myrocks-somefixminmaxaveragemedian
point0.550.790.630.62
range0.530.740.650.65
writes0.501.010.670.62

Results: c24r64 with InnoDB and point queries

Summary

  • results are stable here, InnoDB from FB MySQL is no worse than 10% slower than upstream

Results: c24r64 with MyRocks and point queries

Summary

  • the worst case for MyRocks are the tests that do point lookup on secondary indexes because that uses a range scan rather than a point lookup on the LSM tree, which means that bloom filters cannot be used

Results: c24r64 with InnoDB and range queries

Summary

  • the worst case for InnoDB from FB MySQL are the long range scans and fixing bug 1506 will be a big deal

Results: c24r64 with MyRocks and range queries

Summary

  • while long range scans are the worst case here, bug 1506 is not an issue as that is InnoDB-only

Results: c24r64 with InnoDB and writes

Summary

  • results are stable here, InnoDB from FB MySQL is no worse than ~10% slower than upstream

Results: c24r64 with MyRocks and writes

Summary

  • while MyRocks does much better than InnoDB for update-index at 1 thread, that benefit goes away at 16 threads. It does better at update-index because it does blind writes rather than RMW for non-unique secondary index maintenance. Perhaps the issue at high concurrency is memory system stalls because this server has 2 sockets.



Evaluating vector indexes in MariaDB and pgvector: part 1

This post has results for vector index support in MariaDB and Postgres. I am new to vector indexes so I will start small and over time add m...