Thursday, April 9, 2020

Reviving mstat

A long time ago I wrote mstat to collect iostat, vmstat and MySQL (show global status) performance counters. It samples everything at the same interval (every N seconds), computes rates for all numeric values and has some support for expressions so I can define a new counter as a function of other counters.

Eventually I stopped using it and then there were problems. Recently I fixed many of them. The tool now works with Python3 and supports old and new iostat output format. This week I started to add support for Postgres and MongoDB. It gets data from pg_stat_bgwriter for Postgres and from db.serverStatus() for MongoDB. So the tool is useful to me again but it won't be useful to others until I add docs.

The tool prints one line in CSV format per time interval. The line is likely to be long because there can be hundreds of counters in the line. The name and offset for each counter is printed at startup. I extract data using bash and awk as part of my benchmark workflow.

Because of this dependency to list the offset per counter at startup the tool is not able to handle dynamic data yet. An example of dynamic data are per-DBMS counters such as the rows in pg_stat_database. That view has one row per database and the database names (datname) are not fixed. Similar problems exist for the per-table and per-index monitoring tables and views provided by MySQL, other stats views provided by Postgres and other monitoring data provided by MongoDB. So I need more time to figure out how to support them.

I have more work to do on mstat, but I am happy that I can use it again:
  • I need to figure out whether there are global counters in Postgres for things like queries and rows read
  • For MongoDB mstat has a hack to replace space, dot and parentheses with _ in key names
  • More work remains to make sure that counters are ignored unless their type is numeric or a date

Monday, April 6, 2020

Creating performance reports

I am trying to automate some of the tasks that I must do to produce performance reports. By trying I mean that I am just getting started on this and will be making decisions I might regret later. This is also a request for advice, feedback and corrections.

Some requirements, not all of which will be satisfied:
  • interactive - I prefer that the format can be hosted in a way that allows for discussions. Google Docs does a great job at this by supporting access control and threaded discussions on comments.
  • graphics - support charts and graphs. Back in the day I was a happy user of Google Image Charts that let me define a chart in a URL. Alas, the service has been turned off and the new thing from Google requires Javascript.
  • self-contained - I want to share the report as one thing. With Google Docs that thing can be the URL for the doc, even when there are additional things, like spreadsheets, linked to it. With HTML the thing is the HTML document.
  • script-able - I want to create as much as possible of the document via scripting. I frequently repeat tests and don't want to waste time reformatting tables.
  • private - some reports are not yet public and I prefer a solution that isn't always public. However I also want a solution that works for company-private reports and also for reports that are shared with the public. 
  • other - supports wide lines (via a slider, wrapping wide lines is lousy for a reader), what else?

Choices:
  • Google docs
    • interactive - yes
    • graphics - yes
    • self-contained -yes
    • scriptable - no
    • privacy - yes
    • other - no support for wide lines, they are always wrapped
  • Github Pages
    • interactive - no, always public
    • graphics - maybe
    • self-contained - yes
    • scriptable - yes
    • privacy - no, always public
    • other - supports wide lines
  • Blogger
    • interactive - somewhat via comments but not as nice as Google docs, but always public
    • graphics - yes by embedding images
    • self-contained - yes
    • scriptable - maybe if I can paste an HTML doc as the content
    • privacy - no, always public
    • other - weak support for wide lines
  • Plain HTML
    • interactive - maybe*
    • graphics - maybe*
    • self-contained - maybe*
    • scriptable - yes
    • privacy - yes*
    • other - supports wide lines
From the above I am not aware of anything close to a perfect solution. I am likely to choose plain HTML. Privacy can be managed by sharing via company email and or hosting the HTML files on company-private servers.

Charts and graphs could be provided by a web-service that lets a chart be defined in a URL. Quickchart is an example of that. Of course, use of such a service means the data is no longer private. But there is a difference between posting a web-page for the world to read and sharing that data with the company that provides the URL chart service.

If I use plain HTML then I must find a way to share such pages with the public. That last sentence is amusing given that the web runs on HTML. But I have never done this. I have been a long-time, and mostly happy, user of Blogger. I have begun to use Github Pages and am a big fan of markup.

As a bonus, I prefer a solution that doesn't depend on the whims of a vendor. For example, if I commit to service X and the vendor for service X turns off that service, I don't want to spend weeks reformatting reports while I migrate to a new service.

It would be great if I could inline some of the charts in the HTML document. I need to figure out whether there are good tools for simple ascii bar charts and graphs that can be pasted into an HTML document.

I am a fan of gnuplot and need to learn more about how I can share gnuplot output in PNG format as part of an HTML document without posting all of that on the web.

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the   fashion-mnist-784-euclidean  dataset for MariaDB and Postgres (pgvector) with conc...