I am trying out a dedicated server from Hetzner for my performance work. I am trying the ax162-s that has 48 cores (96 vCPU), 128G of RAM and ~4TB of NVMe storage (2 devices, RAID 1).
The reasons to try it are:
- A similar server on GCP or AWS will cost ~10X more assuming list prices. The difference on GCP drops to ~5X if I commit to 3 years of usage.
- It has access to hardware counters (PMC) which I need as part of my debugging workflow. The docs for GCP and AWS servers on HW counters barely exist. But maybe GCP and AWS don't want you to improve efficiency given the markups they have on their hardware.
- AFAIK with AWS you need a dedicated server (*metal) which means you will be buying something large (and expensive).
- With GCP you need to get a C4 machine type, you will only get a subset of the HW counters, and I was unable to get a quota increase for C4 this weekend so the largest server I could get had 16 cores.
So far I am impressed by both the cost and the user experience. I expect to rent a second server soon and stop spending too much money on GCP.
Create account and purchase
Prior to my purchase there was an account verification step. I started this on Monday and that was resolved by the next day. Once verified it took less than 10 minutes for my server to be ready which matched the estimate on the purchase page.
Everything (OS, home directory, directories I will use for the database) are all on the RAID 1 storage setup. While I try to use a separate device for the database on GCP and my home servers, I think that adding storage devices to my setup would have meant waiting a few days for them to configure it.
When making the purchase I had a choice of operating systems (I chose Ubuntu 22.04). There was also a choice for password or ssh logins. When you choose ssh there is a text box for a public key. I chose ssh.
The specs are:
- AMD EPYC 9454P 48-Core Processor with SMT enabled
- 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4 and ext3
- 128G RAM (for the same prices I can get 256G but half the storage)
- Ubuntu 22.04 running the non-HWE kernel (5.5.0-118-generic)
Setup
The first thing to do is add a non-root user because Postgres doesn't run as root. I added a user without a password, then added it to the sudo group and finally setup ssh for it.
adduser --disabled-password --shell /bin/bash --gecos "mark" $usernameadduser $username sudo
Output from /proc/cpuinfo is here. I see there are 96 CPUs reported so I need to disable AMD SMT because it (and Intel Hyperthreads) are frequently a problem for benchmarks. I added this to /etc/default/grub to disable SMT which didn't work:
GRUB_CMDLINE_LINUX_DEFAULT="nosmt"
Adding it to the existing usage of GRUB_CMDLINE_LINUX did work:
GRUB_CMDLINE_LINUX="consoleblank=0 systemd.show_status=true nosmt"
From df -h I see that the partition for / has 2T available and I will use that for the database directories.
# df -hFilesystem Size Used Avail Use% Mounted ontmpfs 13G 26M 13G 1% /run/dev/md2 2.0T 3.2G 1.9T 1% /tmpfs 63G 0 63G 0% /dev/shmtmpfs 5.0M 0 5.0M 0% /run/lock/dev/md1 989M 257M 682M 28% /boot/dev/nvme1n1p1 256M 316K 256M 1% /boot/efi/dev/md3 1.5T 28K 1.5T 1% /hometmpfs 13G 4.0K 13G 1% /run/user/0
From mdadm I see that the RAID 1 device is still resyncing so I will wait for that to finish before running benchmarks. A few hours after my purchase the resync is ~80% complete. Output from sudo mdadm --detail /dev/md2 is here.
Output from smartctl -a /dev/nvme0n1p4 is here which is how I figured out the brand of NVMe device.
Setup: CPU frequency governor
I have written about CPU frequency governors and modern Linux -- see here. Per Ubuntu, this server uses the schedutil frequency governor by default:
$ cpupower frequency-infoanalyzing CPU 0:driver: acpi-cpufreqCPUs which run at the same hardware frequency: 0CPUs which need to have their frequency coordinated by software: 0maximum transition latency: Cannot determine or is not supported.hardware limits: 1.50 GHz - 3.81 GHzavailable frequency steps: 2.75 GHz, 2.10 GHz, 1.50 GHzavailable cpufreq governors: conservative ondemand userspace powersave performance schedutilcurrent policy: frequency should be within 1.50 GHz and 2.75 GHz.The governor "schedutil" may decide which speed to usewithin this range.current CPU frequency: Unable to call hardwarecurrent CPU frequency: 3.78 GHz (asserted by call to kernel)boost state support:Supported: yesActive: no
So I run these commands:
cpupower frequency-set --governor performancecpupower frequency-set -u 2.75GHz
And now I see:
analyzing CPU 0:driver: acpi-cpufreqCPUs which run at the same hardware frequency: 0CPUs which need to have their frequency coordinated by software: 0maximum transition latency: Cannot determine or is not supported.hardware limits: 1.50 GHz - 3.81 GHzavailable frequency steps: 2.75 GHz, 2.10 GHz, 1.50 GHzavailable cpufreq governors: conservative ondemand userspace powersave performance schedutilcurrent policy: frequency should be within 1.50 GHz and 2.75 GHz.The governor "performance" may decide which speed to usewithin this range.current CPU frequency: Unable to call hardwarecurrent CPU frequency: 1.80 GHz (asserted by call to kernel)boost state support:Supported: yesActive: no
Future setup
Eventually I will purchase their storage service so I can archive things I want to preserve for a long time including results and builds of MySQL and Postgres.
The kernel default RAID rebuild speed limit is 200MB/sec; much too slow for modern NVME; add a few zeros and move on.
ReplyDeletesudo sysctl -w dev.raid.speed_limit_max=2000000
Thank you for the advice. Fortunately I could see how fast, or slow, it was running via iostat and I found things to do while running -- building many versions of Postgres, MariaDB and MySQL.
Delete