Small Datum: profound

Showing posts with label profound. Show all posts

Wednesday, March 4, 2020

RDBMS != SQL DBMS

We use RDBMS as another name for SQL DBMS but SQL isn't relational. That isn't news, see this web site and book. SQL allows for but doesn't require relational and 1NF or 3NF are optional. JSON is in the SQL:2106 spec. What would Codd think?

Using Oracle as a SQL DBMS example. First there was support for collection data types, then there was XML and eventually JSON arrived. These let you violate 1NF. I won't argue whether these should be used. I only claim they can be used.

Have there been surveys to document how often the relational approach is used with a SQL DBMS? I assume it is better to think of a distribution of approaches (a value between 0 and 1 where 0 is SQL and 1 is relational) rather than a binary approach of relational vs SQL (not relational). I might call the SQL endpoint the pragmatic approach, but that introduces bias. While I have spent a long time working on SQL DBMS I am usually working under the hood and don't design applications.

Friday, January 31, 2020

Copyleft vs the DeWitt Clause

There is recent benchmarketing drama between AWS and Microsoft.

Section 1.8 of the AWS service terms includes:

(ii) agree that we may perform and disclose the results of Benchmarks of your products or services, irrespective of any restrictions on Benchmarks in the terms governing your products or services.

Some software includes a DeWitt Clause to prevent users and competitors from publishing benchmark results. I am not a lawyer but wonder if section 1.8 of the AWS service terms allows Amazon to counter with their own benchmark results when their competitors software and services use a DeWitt Clause. This would be similar to the effect of copyleft.

I hope David DeWitt doesn't mind the attention that the DeWitt Clause receives. He has done remarkable database research that has generated so much -- great PhD topics, better DBMS products, a larger CS department at UW-Madison and many jobs. But he is also famous for the DeWitt Clause.

Monday, January 7, 2019

Define "better"

Welcome to my first rant of 2019, although I have written about this before. While I enjoy benchmarketing from a distance it is not much fun to be in the middle of it. The RocksDB project has been successful and thus becomes the base case for products and research claiming that something else is better. While I have no doubt that other things can be better I am wary about the definition of better.

There are at least 3 ways to define better when evaluating database performance. The first, faster is better, ignores efficiency, the last two do not. I'd rather not ignore efficiency. The marginal return of X more QPS eventually becomes zero while the benefit of using less hardware is usually greater than zero.

Optimize for throughput and ignore efficiency (faster is better)
Get good enough performance and then optimize for efficiency
Get good enough efficiency and then optimize for throughput

Call to action

I forgot to include this before publishing. Whether #1, #2 or #3 is followed I hope that more performance results include details on the HW consumed to create that performance. How much memory and disk space were used? What was the CPU utilization? How many bytes were read from and written to storage? How much random IO was used? I try to report both absolute and relative values where relative values are normalized by the transaction rate.

Thursday, November 9, 2017

Advice on advice

Advice is free but sometimes you get what you pay for. In technology there is an abundance of great ideas and much of that free advice might be excellent. Thanks to my time with web-scale MySQL I have a lot of experience in receiving advice so I will offer advice on giving it. I realize that my advice on advice applies to my advice. I don't write many posts like this, my last might have been this one.

A big problem is that time is limited. We don't have enough time to evaluate all of the good ideas. Just like a modern query optimizer we can't do exhaustive search. Our innovation budget is small so we then have to find the best idea worth doing that can be implemented and deployed given the small budget. Another problem is uncertainty. The system we have is likely good enough today. It was probably good enough yesterday. Any benchmark is unlikely to capture the ability to adapt to changes over time in workload and hardware. That ability is valuable.

Finally, my advice. The list below is ordered from least to most effective ways to offer advice:

You should do X (and can you tell me more about your system?)
... and I have taken the time to understand your system
... and I will get the resources to evaluate it
... and I will get the resources to implement it
... and I will join the oncall for it when this is deployed
... and my reputation is on the line when this goes horribly wrong
... and I will stick around for a few years to fix all of the bugs that show up