Monday, October 31, 2022

s/optimal/better/g - on reviewing conference papers

I spent a few years reviewing papers for database conferences. I think that is winding down. I was OK as a reviewer, definitely not great, and this summarizes my experience.

For starters, I am in awe of good reviwers. As a reviewer you get to see feedback from the other reviwers after submitting your review. And I was always nervous while waiting to see the other reviews. Was my review an outlier? How much did I miss in my review? Reading good reviews after submitting a mediocre review is a great way to learn.

As always, a key to success is to choose the right base case especially if you want to show linear speedup or scaleup. Many of the papers use the DBMS that I know quite well (MySQL, RocksDB) as the base case. So I have frequent someone on the internet is wrong moments while reading such papers. My offer to provide (free) benchmark advice to research projects was ignored. But maybe that is OK, because I am already busy.

My goal was to focus on the ideas in the paper and allow that the experimental results might be truthy rather than true. There are many interesting ideas in conference papers even if the experimental results aren't perfect. The review process was better for me after I learned to appreciate the ideas and worry less about flaws in the benchmarks.

I prefer to read a paper that explains where the new idea both is and is not better, but perhaps the marketing pressure and page limit means that papers don't focus much on where the new idea isn't better. I also prefer to have an explanation of the results -- I have had to retract a few DBMS perf blog posts because I didn't explain the results and they turned out to be misleading or bogus. Alas, papers rarely explained the (disappointing) results of the base case when showing they were so much faster than the base case.

Frequent feedback I had for papers:

  • s/optimal/better/g because optimal has a high bar. Papers would show their thing improved performance in some cases but better might not be optimal. Some math would be needed for that.
  • Provide more details to make reproduction possible. I am not offering to spend the time to do a repro nor am I suggesting that others should spend their time on that. But I want to know basic things about the base case (the thing that your thing is better than) including version tested, configuration details and command lines.
A short rant on the innovation bar
  • By innovation bar I mean the requirement that a paper shows how the idea is new relative to previously published peer-reviewed research. For some research track papers that I have reviewed I feel like the bar is raised too high, but that isn't my rant. My rant is that I have seen at least one industrial track paper rejected for not being innovative and my reading of the guidelines for industrial track papers in VLDB and SIGMOD is that there isn't an innovation bar for industry papers.
Bogus feedback that I avoided giving:
  • This idea has been implemented by DBMS or documented in some blog post. Conferences place a lot of value on innovation but fortunately the scope for prior art is limited to peer reviewed publications, not random blogs like this one. And researchers should not be expected to know all of the implementation details for amazing but not always well documented open and closed source DBMS. By bogus I mean that this doesn't count against the innovation bar -- research-track papers must show that something is innovative in their work. When said idea has previously been published then the reviewed paper must show how it differs.  But the paper isn't required to show how it differs from an idea described in a blog post or implemented in a production DBMS. Sometimes I gave feedback with a strong disclaimer when ideas have previously appeared in blog posts or an existing DBMS. And by strong disclaimer I mean that I made it clear that I would not and could not hold this against the paper.
Feedback that I was extremely reluctant to provide:
  1. Run X more tests. I am happy to require that before software is deployed in production. I am wary of asking for much more unless the experiment section is sad.
Updates:
  • Clarified what I mean by bogus feedback
  • Added rant on innovation bar

No comments:

Post a Comment

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...