Wednesday, February 26, 2025

Feedback from the Postgres community about the vector index benchmarks

This is a response to some of the feedback I received from the Postgres community about my recent benchmark results for vector indexes using MariaDB and Postgres (pgvector). The work here isn't sponsored and required ~2 weeks days of server time and a few more hours of my time (yes, I contribute to the PG community).

tl;dr

  • index create is ~4X faster when using ~4 parallel workers. I hope that parallel DDL comes to more open source DBMS.
  • parallel query does not help pgvector
  • increasing work_mem does not help pgvector
  • pgvector gets less QPS than MariaDB because it uses more CPU to compute the distance metric
The feedback

The feedback I received includes
  • benchmarks are useless, my production workload is the only relevant workload
    • I disagree, but don't expect consensus. But this approach means you are unlikely to do comparative benchmarks because it costs too much to port your workload to some other DBMS just for the sake of a benchmark and only your team can run this so you will miss out on expertise from elsewhere.
  • this work was sponsored so I don't trust it
    • There isn't much I can to about that.
  • this is benchmarketing!
    • There is a marketing element to this work. Perhaps I was not the first to define benchmarketing, but by my definition a benchmark report is not benchmarketing when the results are explained. I have been transparent about how I ran the benchmark and shared some performance debugging results here. MariaDB gets more QPS than pgvector because pgvector uses more CPU to compute the distance metric. 
  • you should try another Postgres extension for vector indexes
    • I hope to do that eventually. But pgvector is a great choice here because it implements HNSW and MariaDB implements modified HNSW. Regardless time is finite, the numbers of servers I have is finite and my work here (both sponsored and volunteer) competes with my need to finish my taxes, play video games, sleep, etc.
  • this result is bogus because I didn't like some config setting you used
    • Claims like this are cheap to make and expensive to debunk. Sometimes the suggested changes make a big difference but that has been rare in my experience. Most of the time the changes at best make a small difference.
  • this result is bogus because you used Docker
    • I am not a Docker fan, but I only used Docker for Qdrant. I am not a fan because I suspect it is overused and I prefer benchmark setups that don't require it. While the ann-benchmarks page states that Docker is used for all algorithms, it is not used for MariaDB or Postgres in my fork. And it is trivial, but time consuming, to update most of ann-benchmarks to not use Docker.
Claims about the config that I used include:
  • huge pages were disabled
    • Yes they were. Enabling huge pages would have helped both MariaDB and Postgres. But I prefer to not use them because the feature is painful (unless you like OOM). Posts from me on this are here and here.
  • track_io_timing was enabled and is expensive
    • There wasn't IO when queries ran as the database was cached so this is irrelevant. There was some write IO when the index was created. While I won't repeat tests with track_io_timing disabled, I am skeptical that the latency of two calls to gettimeofday() per IO are significant on modern Linux.
  • autovacuum_vacuum_cost_limit is too high
    • I set it to 4000 because I often write write-intensive benchmarks on servers with large IOPs capacities. This is the first time anyone suggested they were too high and experts have reviewed my config files. I wish that Postgres vacuum didn't require so much tuning -- and all of that tuning means that someone can always claim your tuning is wrong. Regardless, the benchmark workload is load, create index, query and I only time the create index and query steps. There is little impact from vacuum on this benchmark. I have also used hundreds of server hours to search for good Postgres configs.
  • parallel operations were disabled
    • Parallel query isn't needed for the workloads I normally run but parallel index create is nice to have. I disable parallel index create for Postgres because my primary focus is efficiency -- how much CPU and IO is consumed per operation. But I haven't been clear on that in my blog posts. Regardless, below I show there is a big impact from parallel index create and no impact from parallel query.
  • work_mem is too low
    • I have been using the default which is fine for the workloads I normally run (sysbench, insert benchmark). Below I show there is no impact from increasing it to 8M.

Benchmark

This post has much more detail about my approach in general. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit.  I used the dbpedia-openai dataset with 1M rows. It uses angular (cosine) for the distance metric. The ann-benchmarks config files is here for Postgres and in this case I only have results for pgvector with halfvec (float16).

I used a large server (Hetzner ax162-s) with 48 cores, 128G of RAM, Ubuntu 22.04 and HW RAID 10 using 2 NVMe devices. I tested three configurations for Postgres and all of the settings are here:
  • def
    • def stands for default and is the config I used in all of my previous blog posts. Thus, it is the config for which I received feedback.
  • wm8
    • wm8 stands for work_mem increased to 8MB. The default (used by def) is 4MB.
  • pq4
    • pq4 stands for Parallel Query with ~4 workers. Here I changed a few settings from def to support that.
Output from the benchmark is here.

The command lines to run the benchmark using my helper scripts are:
    bash rall.batch.sh v1 dbpedia-openai-1000k-angular c32r128

Results: QPS vs recall

These charts show the best QPS for a given recall. The graphs appears to be the same but the differences are harder to see as recall approaches 1.0 so the next section has a table with numbers.

But from these graphs, QPS doesn't improve with the wm8 or pq4 configs.


The chart for the def config which is what I used in previous blog posts.
The chart for the wm8 config with work_mem=8M
The chart for the pq4 config that uses ~4 parallel workers

Results: best QPS for a given recall

Many benchmark results are marketed via peak performance (max throughput or min response time) but these are usually constrained optimization problems -- determine peak performance that satisfies some SLA. And the SLA might be response time or efficiency (cost).

With ann-benchmarks the constraint is recall. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs.

Summary
  • pgvector does not get more QPS with parallel query
  • pgvector does not get more QPS with a larger value for work_mem
Legend:
  • recall, QPS - best QPS at that recall
  • isecs - time to create the index in seconds
  • m= - value for M when creating the index
  • ef_cons= - value for ef_construction when creating the index
  • ef_search= - value for ef_search when running queries
Best QPS with recall >= 1.000
no algorithms achived this for the (M, ef_construction) settings I used

Best QPS with recall >= 0.99
recall  QPS     isecs   config
0.990   317.9   7302    PGVector_halfvec(m=48, ef_cons=256, ef_search=40)
0.990   320.4   7285    PGVector_halfvec(m=48, ef_cons=256, ef_search=40)
0.990   316.9   1565    PGVector_halfvec(m=48, ef_cons=256, ef_search=40)

Best QPS with recall >= 0.98
0.983   412.0   4120    PGVector_halfvec(m=32, ef_cons=192, ef_search=40)
0.984   415.6   4168    PGVector_halfvec(m=32, ef_cons=192, ef_search=40)
0.984   411.4    903    PGVector_halfvec(m=32, ef_cons=192, ef_search=40)

Best QPS with recall >= 0.97
0.978   487.3   5070    PGVector_halfvec(m=32, ef_cons=256, ef_search=30)
0.970   508.3   2495    PGVector_halfvec(m=32, ef_cons=96, ef_search=30)
0.970   508.4   2495    PGVector_halfvec(m=32, ef_cons=96, ef_search=30)

Best QPS with recall >= 0.96
0.961   621.1   4120    PGVector_halfvec(m=32, ef_cons=192, ef_search=20)
0.962   632.3   4168    PGVector_halfvec(m=32, ef_cons=192, ef_search=20)
0.962   622.0    903    PGVector_halfvec(m=32, ef_cons=192, ef_search=20)

Best QPS with recall >= 0.95
0.951   768.7   2436    PGVector_halfvec(m=16, ef_cons=192, ef_search=30)
0.952   770.2   2442    PGVector_halfvec(m=16, ef_cons=192, ef_search=30)
0.953   753.0    547    PGVector_halfvec(m=16, ef_cons=192, ef_search=30)

Results: create index

The database configs for Postgres are shared above and parallel index create is disabled by default because my focus has not been on DDL performance. Regardless, it works great for Postgres with pgvector. The summary is:
  • index create is ~4X faster when using ~4 parallel workers
  • index sizes are similar with and without parallel create index
Sizes: table is ~8G and index is ~4G

Legend
  • M - value for M when creating the index
  • cons - value for ef_construction when creating the index
  • secs - time in seconds to create the index
  • size(MB) - index size in MB
                def     wm8     pq4
M       cons    secs    secs    secs
 8       32      412     405     108
16       32      649     654     155
 8       64      624     627     154
16       64     1029    1029     237
32       64     1901    1895     412
 8       96      834     835     194
16       96     1387    1393     312
32       96     2497    2495     541
48       96     3731    3726     798
 8      192     1409    1410     316
16      192     2436    2442     547
32      192     4120    4168     903
48      192     6117    6119    1309
64      192     7838    7815    1662
 8      256     1767    1752     400
16      256     3146    3148     690
32      256     5070    5083    1102
48      256     7302    7285    1565
64      256     9959    9946    2117

Thursday, February 20, 2025

How to find Lua scripts for sysbench using LUA_PATH

sysbench is a great tool for benchmarks and I appreciate all of the work the maintainer (Alexey Kopytov) put into it as that is often a thankless task. Today I struggled to figure out how to load Lua scripts from something other than the default location that was determined when sysbench was compiled. It turns out that LUA_PATH is the thing to set, but the syntax isn't what I expected.

My first attempt was this, because the PATH in LUA_PATH implies directory names. But that failed.
  LUA_PATH="/mnt/data/sysbench.lua/lua" sysbench ... oltp_insert run

It turns out that LUA_PATH uses special semantics and this worked:
  LUA_PATH="/mnt/data/sysbench.lua/lua/?.lua" sysbench ... oltp_insert run


The usage above replaces the existing search path. The usage below prepends the new path to the existing (compiled in) path:

  LUA_PATH="/mnt/data/sysbench.lua/lua/?.lua;;" sysbench ... oltp_insert run


Wednesday, February 19, 2025

My database communities

I have been working on databases since 1996. In some cases I just worked on the product (Oracle & Informix), in others I consider myself a member of the community (MySQL, Postgres & RocksDB). And for MongoDB I used to be in the community.

I worked on Informix XPS in 1996. I chose Informix because I could live in Portland OR and walk to work. I was fresh out of school, didn't know much about DBMS, but got a great starter project (star query optimization). The company wasn't in great shape so I left by 1997 for Oracle. I never used Informix in production and didn't consider myself as part of the Informix community.

I was at Oracle from 1997 to 2005. The first 3 years were in Portland implementing JMS for the app server team and the last 5 years at Oracle HQ working on query execution.  I fixed many bugs, added support for ieee754 types, rewrote sort and maintained the sort and bitmap index row sources. The people there were great and I learned a lot but I did not enjoy the code base and left for a startup. I never used Oracle in production and don't consider myself as part of the Oracle community.

I lead the MySQL engineering teams at Google for 4 years and at Facebook/Meta for 10 years. I was very much immersed in production and have been active in the community since 2006. The MySQL teams got much done at both Google (GTID, semi-sync, crash-safe replication, rewrote the InnoDB rw lock) and Facebook/Meta (MyRocks and too many other things to mention). Over the years at FB/Meta my job duties got in the way of programming so I used performance testing as a way to remain current. I also filed many bugs might still be in the top-10 for bug reports. While Oracle has been a great steward for the MySQL project I have been critical about the performance regressions from older MySQL to newer MySQL. I hope that eventually stops because it will become a big problem.

I contributed some code to RocksDB, mostly for monitoring. I spent much more time doing performance QA for it, and filing a few bugs. I am definitely in the community.

I don't use Postgres in production but have spent much time doing performance QA for it over the past ~10 years. A small part of that was done while at Meta, I had a business case, and was able to use some of their HW and my time. But most of this has been a volunteer effort -- more than 100 hours of my time and 10,000+ hours of server time. Some of those server hours are in public clouds (Google, Hetzner) so I am also spending a bit on this. I found a few performance bugs. I have not found large performance regressions over time which is impressive. I have met many of the contributors working on the bits I care about, and that has been a nice benefit.

I used to be a member of the MongoDB community. Like Postgres, I never supported it in production but I spent much time doing performance QA with it. I wrote mostly positive blog posts, filed more than a few bugs and even won the William Zola Community Award. But I am busy enough with MySQL, Postgres and RocksDB so I haven't tried to use it for years. Regardless, I continue to be impressed by how fast they pay down tech debt, with one exception (no cost-based optimizer).

Saturday, February 15, 2025

Vector indexes, large server, dbpedia-openai dataset: MariaDB, Qdrant and pgvector

My previous post has results for MariaDB and pgvector on the dbpedia-openai dataset. This post adds results from Qdrant. This uses ann-benchmarks to compare MariaDB, Qdrant and Postgres (pgvector) with a larger dataset, dbpedia-openai at 500k rows. The dataset has 1536 dimensions and uses angular (cosine) as the distance metric. This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.

tl;dr

  • I am new to Qdrant so the chance that I made a mistake are larger than for MariaDB or Postgres
  • If you already run MariaDB or Postgres then I suggest you also use them for vector indexes
  • MariaDB usually gets ~2X more QPS than pgvector and ~1.5 more than Qdrant

Editorial

I have a bias. I am skeptical that you should deploy a new DBMS to support but one datatype (vectors) unless either you have no other DBMS in production or your production DBMS does not support vector indexing. 
  • Production is expensive -- you have to worry about security, backups, operational support
  • A new DBMS is expensive -- you have to spend time to learn how to use it
My initial response to Qdrant is that the new developer experience isn't very good. This can be fixed, but right now the product is complicated (has many features), configuration is complicated (also true for the DBMS I know, but I already paid that price), and the cognitive load is large. Just one example of the cognitive load is the need to learn the names that Qdrant uses for things that already have well-known names in the SQL DBMS world.

Deploying Qdrant

The more DBMS you include in one benchmark, the more likely you are to make a mistake because you lack expertise in all of those DBMS. I will soon learn whether I made a mistake here but I made a good faith effort to get good results from Qdrant.

I first tried to compile from source. But that failed. The docs state that The current list of required libraries can be found in the Dockerfile and while I was able to figure that out, I prefer that they just list the dependencies. Alas, my attempts to compile from source failed with error messages about problems with (gRPC) protocol definitions.

So I decided to try the Docker container they provide. I ended up not changing the Qdrant configuration provided in the Docker container. I spent some time doing performance debugging and didn't see anything to indicate that a config change was needed. For example, I didn't see disk IO during queries. But the performance debugging was harder because that Docker container image doesn't come with my favorite debug tools installed. Some of the tools were easy to install, others (perf) were not.

Benchmark

This post has much more detail about my approach in general. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit

The ann-benchmarks config files are here for MariaDB, Postgres and Qdrant. For Postgres I use the values for M and ef_construction. But MariaDB doesn't support ef_construction so I only specify the M values. While pgvector requires ef_construction to be >= 2*M, I do not know whether Qdrant has a similar requirement. Regardless I only test cases where that constraint is true.

Some quantization was used
  • MariaDB uses 16-bit integers rather than float32
  • pgvector uses float32, pgvector halfvec uses float16
  • For Qdrant I used none (float32) and scalar (int8)
I used a larger server (Hetzner ax162-s) with 48 cores, 128G of RAM, Ubuntu 22.04 and HW RAID 10 using 2 NVMe devices. The database configuration files are here for MariaDB and for Postgres. Output from the benchmark is here.

I had ps and vmstat running during the benchmark and confirmed there weren't storage reads as the table and index were cached by MariaDB and Postgres.

The command lines to run the benchmark using my helper scripts are:
    bash rall.batch.sh v1 dbpedia-openai-500k-angular c32r128

Results: QPS vs recall

These charts show the best QPS for a given recall. MariaDB gets more QPS than Qdrant and pgvector but that is harder to see as the recall approaches 1, so the next section has a table for best QPS per DBMS at a given recall.

Results: create index

The database configs for Postgres and MariaDB are shared above, and parallel index create is disabled by the config for Postgres and not supported yet by MariaDB. The summary is:
  • index sizes are similar between MariaDB and pgvector with halfvec
  • time to create the index varies a lot and it is better to consider this in the context of recall which is done in next section. But Qdrant creates indexes a lot faster than MariaDB or pgvector.
  • I did not find an accurate way to determine index size for Qdrant. There is a default method in ann-benchmarks that a DBMS can override. The default just compares process RSS before and after creating an index which isn't accurate for small indexes. The MariaDB and Postgres code override the default and query the data dictionary to get a more accurate estimate.
The max time to create an index for MariaDB and Postgres exceeds 10,000 seconds on this dataset when M and/or ef_construction are large. The max time for Qdrant was <= 400 seconds for no quantization and <= 300 seconds for scalar quantization. This is excellent. But I wonder if things Qdrant does (or doesn't do) to save time during create index contributes to making queries slower because MariaDB has much better QPS.

More details on index size and index create time for MariaDB and Postgres are in my previous post.

Results: best QPS for a given recall

Many benchmark results are marketed via peak performance (max throughput or min response time) but these are usually constrained optimization problems -- determine peak performance that satisfies some SLA. And the SLA might be response time or efficiency (cost).

With ann-benchmarks the constraint is recall. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms -- MariaDB, pgvector with float32 and float16/halfvec, Qdrant with no and scalar quantization.

Summary
  • Qdrant with scalar quantization does not get a result for recall=1.0 for the values of M, ef_construction and ef_search I used
  • MariaDB usually gets ~2X more QPS than pgvector and ~1.5 more than Qdrant
  • Index create time was much less for Qdrant (described above)
Legend:
  • recall, QPS - best QPS at that recall
  • rel2ma - (QPS for me / QPS for MariaDB)
  • m= is the value for M when creating the index
  • ef_cons= is the value for ef_construction when creating the index
  • ef_search= is the value for ef_search when running queries
  • quant= is the quantization used by Qdrant
  • dbms
    • MariaDB - MariaDB, there is no option for quantization
    • PGVector - Postgres with pgvector and float32
    • PGVector_halfvec - Postgres with pgvector and halfvec (float16)
    • Qdrant(..., quant=none) - Qdrant with no quantization
    • Qdrant(..., quant=scalar) - Qdrant with scalar quantization
MariaDB gets more QPS than a DBMS when rel2ma is less than 1.0 and when rel2ma is 0.5 then MariaDB gets 2X more QPS. Below, the rel2ma values are always much less than 1.0 except in the first group of results for recall = 1.0. 

Best QPS with recall = 1.000
recall  QPS     rel2ma
1.000   18.3    1.00    MariaDB(m=32, ef_search=200)
1.000   49.4    2.70    PGVector(m=64, ef_construct=256, ef_search=400)
1.000   56.4    3.08    PGVector_halfvec(m=64, ef_construct=256, ef_search=400)
1.000  153.9    8.41    Qdrant(m=32, ef_construct=256, quant=none, hnsw_ef=400)

Best QPS with recall >= 0.99
recall  QPS     rel2ma
0.993   861     1.00    MariaDB(m=24, ef_search=10)
0.991   370     0.43    PGVector(m=16, ef_construct=256, ef_search=80)
0.990   422     0.49    PGVector_halfvec(m=16, ef_construct=192, ef_search=80)
0.990   572     0.66    Qdrant(m=32, ef_construct=256, quant=none, hnsw_ef=40)
0.990   764     0.89    Qdrant(m=48, ef_construct=192, quant=scalar, hnsw_ef=40)

Best QPS with recall >= 0.98
recall  QPS     rel2ma
0.983  1273     1.00    MariaDB(m=16, ef_search=10)
0.981   492     0.39    PGVector(m=32, ef_construct=192, ef_search=30)
0.982   545     0.43    PGVector_halfvec(m=32, ef_construct=192, ef_search=30)
0.981   713     0.56    Qdrant(m=16, ef_construct=192, quant=none, hnsw_ef=40)
0.980   895     0.70    Qdrant(m=16, ef_construct=256, quant=scalar, hnsw_ef=40)

Best QPS with recall >= 0.97
recall  QPS     rel2ma
0.983  1273     1.00    MariaDB(m=16, ef_search=10)
0.971   635     0.50    PGVector(m=32, ef_construct=192, ef_search=20)
0.971   724     0.57    PGVector_halfvec(m=32, ef_construct=192, ef_search=20)
0.972   782     0.61    Qdrant(m=16, ef_construct=192, quant=none, hnsw_ef=30)
0.970   982     0.77    Qdrant(m=16, ef_construct=192, quant=scalar, hnsw_ef=30)

Best QPS with recall >= 0.96
recall  QPS     rel2ma
0.969  1602     1.00    MariaDB(m=12, ef_search=10)
0.965   762     0.48    PGVector(m=16, ef_construct=192, ef_search=30)
0.964   835     0.52    PGVector_halfvec(m=16, ef_construct=192, ef_search=30)
0.963   811     0.51    Qdrant(m=16, ef_construct=96, quant=none, hnsw_ef=30)
0.961   996     0.62    Qdrant(m=16, ef_construct=96, quant=scalar, hnsw_ef=30)

Best QPS with recall >= 0.95
recall  QPS     rel2ma
0.969  1602     1.00    MariaDB(m=12, ef_search=10)
0.954   802     0.50    PGVector(m=16, ef_construct=96, ef_search=30)
0.955   880     0.55    PGVector_halfvec(m=16, ef_construct=96, ef_search=30)
0.954   869     0.54    Qdrant(m=8, ef_construct=256, quant=none, hnsw_ef=40)
0.950  1060     0.66    Qdrant(m=16, ef_construct=192, quant=scalar, hnsw_ef=20)

Monday, February 10, 2025

Vector indexes, MariaDB & pgvector, large server, dbpedia-openai dataset

This post has results from ann-benchmarks to compare MariaDB and Postgres with a larger dataset, dbpedia-openai at 100k, 500k and 1M rows. It has 1536 dimensions and uses angular (cosine) as the distance metric. By larger I mean by the standards of what is in ann-benchmarks. This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.

tl;dr

  • Index create time was much less for MariaDB in all cases except the result for recall >= 0.95
  • For a given recall, MariaDB gets between 2.1X and 2.7X more QPS than Postgres
Benchmark

This post has much more detail about my approach in general. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit.  The ann-benchmarks config files are here for MariaDB and for Postgres.

I used a larger server (Hetzner ax162-s) with 48 cores, 128G of RAM, Ubuntu 22.04 and HW RAID 10 using 2 NVMe devices. The database configuration files are here for MariaDB and for Postgres. Output from the benchmark is here.

I had ps and vmstat running during the benchmark and confirmed there weren't storage reads as the table and index were cached by MariaDB and Postgres.

The command lines to run the benchmark using my helper scripts are:
    bash rall.batch.sh v1 dbpedia-openai-100k-angular c32r128
    bash rall.batch.sh v1 dbpedia-openai-500k-angular c32r128
    bash rall.batch.sh v1 dbpedia-openai-1000k-angular c32r128

Results: QPS vs recall

These charts show the best QPS for a given recall. MariaDB gets about 2X more QPS than Postgres for a specific recall level 

With 100k rows

With 500k rows

With 1M rows

Results: create index

The database configs for Postgres and MariaDB are shared above, and parallel index create is disabled by the config for Postgres and not supported yet by MariaDB. The summary is:
  • index sizes are similar between MariaDB and pgvector with halfvec
  • time to create the index varies a lot and it is better to consider this in the context of recall which is done in next section
Legend
  • M - value for M when creating the index
  • cons - value for ef_construction when creating the index
  • secs - time in seconds to create the index
  • size(MB) - index size in MB
Table sizes:
* Postgres is 7734M
* MariaDB is 7856M

                -- pgvector --          -- pgevector --
                -- float32 --           -- halfvec   --
M       cons    secs    size(MB)        secs    size(MB)
 8       32       458   7734             402    3867
16       32       720   7734             655    3867
 8       64       699   7734             627    3867
16       64      1144   7734            1029    3867
32       64      2033   7734            1880    3867
 8       96       934   7734             843    3867
16       96      1537   7734            1382    3867
32       96      2730   7734            2482    3867
48       96      4039   7734            3725    3867
 8      192      1606   7734            1409    3867
16      192      2778   7734            2435    3867
32      192      4683   7734            4154    3867
48      192      6830   7734            6106    3867
64      192      8601   7734            7831    3958
 8      256      2028   7734            1764    3867
16      256      3609   7734            3151    3867
32      256      5838   7734            5056    3867
48      256      8224   7734            7283    3867
64      256     11031   7734            9931    3957

mariadb
M       secs    size(MB)
 4        318   3976
 5        372   3976
 6        465   3976
 8        717   3976
12       1550   3976
16       2887   3976
24       7248   3976
32      14120   3976
48      36697   3980

Results: best QPS for a given recall

Many benchmark results are marketed via peak performance (max throughput or min response time) but these are usually constrained optimization problems -- determine peak performance that satisfies some SLA. And the SLA might be response time or efficiency (cost).

With ann-benchmarks the constraint is recall. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms (MariaDB, pgvector with float32, pgvector with float16/halfvec).

Summary
  • Postgres does not get recall=1.0 for the values of M, ef_construction and ef_search I used
  • Index create time was much less for MariaDB in all cases except the result for recall >= 0.95
  • For a given recall target, MariaDB gets between 2.1X and 2.7X more QPS than Postgres
Legend:
  • recall, QPS - best QPS at that recall
  • isecs - time to create the index in seconds
  • m= - value for M when creating the index
  • ef_cons= - value for ef_construction when creating the index
  • ef_search= - value for ef_search when running queries
Best QPS with recall >= 1.000, pgvector did not reach the recall target
recall  QPS     isecs
1.000    20     36697   MariaDB(m=48, ef_search=40)

Best QPS with recall >= 0.99, MariaDB gets >= 2.2X more QPS than Postgres
recall  QPS     isecs
0.990   287     8224    PGVector(m=48, ef_cons=256, ef_search=40)
0.990   321     7283    PGVector_halfvec(m=48, ef_cons=256, ef_search=40)
0.992   731     7248    MariaDB(m=24, ef_search=10)

Best QPS with recall >= 0.98, MariaDB gets >= 2.7X more QPS than Postgres
recall  QPS     isecs
0.984    375    4683    PGVector(m=32, ef_cons=192, ef_search=40)
0.984    418    4154    PGVector_halfvec(m=32, ef_cons=192, ef_search=40)
0.981   1130    2887    MariaDB(m=16, ef_search=10)

Best QPS with recall >= 0.97, MariaDB gets >= 2.3X more QPS than Postgres
recall  QPS     isecs
0.974    440    6830    PGVector(m=48, ef_cons=192, ef_search=20)
0.973    483    6106    PGVector_halfvec(m=48, ef_cons=192, ef_search=20)
0.981   1130    2887    MariaDB(m=16, ef_search=10)

Best QPS with recall >= 0.96, MariaDB gets >= 2.2X more QPS than Postgres
recall  QPS     isecs
0.962    568    4683    PGVector(m=32, ef_cons=192, ef_search=20)
0.961    635    4154    PGVector_halfvec(m=32, ef_cons=192, ef_search=20)
0.965   1433    1550    MariaDB(m=12, ef_search=10)

Best QPS with recall >= 0.95, MariaDB gets >= 2.1X more QPS
recall  QPS     isecs
0.953    588    2730    PGVector(m=32, ef_cons=96, ef_search=20)
0.957    662    1382    PGVector_halfvec(m=16, ef_cons=96, ef_search=40)
0.965   1433    1550    MariaDB(m=12, ef_search=10)

Feedback from the Postgres community about the vector index benchmarks

This is a response to some of the feedback I received from the Postgres community about my recent benchmark results for vector indexes usin...