Thursday, April 18, 2024

Changes to managing turbo boost in Ubuntu 22.04 and Linux 6.5

I often use HWE kernels with Ubuntu and currently use Ubuntu 22.04. Until recently that meant I ran Linux 6.2 but after a recent update I am now on Linux 6.5.

I am far from an expert on this topic and what I write here might just be notes to myself. Be wary of following my advice.

Disabling turbo boost yesterday

I have been disabling turbo boost for many years on my home test servers to reduce performance variance from hardware, especially as the weather gets warm because I don't have a server room with AC.  The problem with turbo boost on some of my servers was cyclical behavior:

  1. CPU cools, turbo boost does its thing
  2. benchmark runs faster
  3. CPU gets hot
  4. turbo boost stops doing its thing
  5. benchmark runs slower
  6. repeat

On my Intel servers I disable turbo boost via BIOS settings. On my AMD servers that used to be done via a script because I was using acpi-cpufreq: echo 0 > /sys/devices/system/cpu/cpufreq/boost

My goal is repeatable performance and I am willing to sacrifice peak HW performance to get that. Avoiding the cycle described above helps to achieve that. Alas this is a spectrum -- I tolerate other things (CPU cache, database cache) that improve performance while adding variance. I assume that I want CPU frequency to stay within a narrow range. It isn't clear that even when using acpi-cpufreq that I was getting a narrow range, but it did help. 

From the Ryzen 7 7840HS CPU I am use on these servers the AMD specs state that the base speed is 3.8GHz and the max boost is up to 5.1GHz. With acpi-cpufreq the CPU cores can be in one of three frequency levels, and from cpupower frequency-info they are:

available frequency steps:  3.80 GHz, 2.20 GHz, 1.60 GHz

So even with turbo boost disabled (see the echo command above) there is still room for variance. But I don't know enough to determine whether I need to do more tuning.

Disabling turbo boost today

After a recent update on Ubuntu 22.04 with HWE kernels I now run 6.5.0-27-generic and acpi-cpufreq has been replaced by amd-pstate. I am sure there are many benefits from this change, alas, it also brings complexity and confusion from users who now have server cooling problems (because things are running faster) and are trying to figure out how to fix them. Notes on setting up the server are here.

I noticed this change because with the the default (amd-pstate in active mode) this file doesn't exist:

/sys/devices/system/cpu/cpufreq/boost

On a Ryzen 7 CPU I get the amd-pstate-epp driver in active mode. Output from /proc/cpuinfo and cpupower frequency-info from this state is below. Note that  /sys/devices/system/cpu/cpufreq/boost doesn't exist when in active mode. It does exist when in guided or passive mode. So I either need to switch to guided or passive mode or rollback to using the acpi-cpufreq driver. Which means I need to understand a bit more.

There is a lot of documentation for the amd-pstate driver. It isn't meant for the casual user.

There is a big difference between acpi-cpufreq and amd-pstate and amd-pstate is the future but perhaps not today (for me). While with acpi-cpufreq and turbo boost disabled I should only get one of three CPU frequencies, I can get many more with amd-pstate. From cpupower frequency-info output

analyzing CPU 7:
  driver: amd-pstate-epp
  CPUs which run at the same hardware frequency: 7
  CPUs which need to have their frequency coordinated by software: 7
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 5.61 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 5.61 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.97 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: no

For now I will just rollback to using the acpi-cpufreq driver while figuring this out and possibly waiting for Linux 6.6 to show up on Ubuntu 22.04. I am not sure how mature amd-pstate is, and I won't get support for cpupower set --turbo-boost 1 until 6.6 arrives.

I now have this in /etc/default/grub: 

GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm=off nosmt amd_pstate=disable"

  • pcie_aspm=off is there to avoid correctable PCI errors (maybe Beelink BIOS needs an update)
  • nosmt disables hyperthreads because BIOS doesn't have an option for that
  • amd_pstate=disable lets me use the acpi-cpufreq driver 

CPU frequencies with acpi-cpufreq

This shows the CPU frequencies I get from an idle server with the acpi-cpufreq driver. Note that I mostly get only 3 values when boost is disabled (set to 0).

With /sys/devices/system/cpu/cpufreq/boost set to 0

  current CPU frequency: 2.18 GHz (asserted by call to kernel)
  current CPU frequency: 1.50 GHz (asserted by call to kernel)
  current CPU frequency: 3.80 GHz (asserted by call to kernel)
  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 3.80 GHz (asserted by call to kernel)

With /sys/devices/system/cpu/cpufreq/boost set to 1

  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 2.04 GHz (asserted by call to kernel)
  current CPU frequency: 2.11 GHz (asserted by call to kernel)
  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 1.57 GHz (asserted by call to kernel)
  current CPU frequency: 1.60 GHz (asserted by call to kernel)
  current CPU frequency: 3.21 GHz (asserted by call to kernel)

Appendix

Note that cpupower frequency-info only shows frequencies for one core, to see them all use cpupower -c all frequency-info.

Output from  cpupower frequency-info with active mode

analyzing CPU 7:
  driver: amd-pstate-epp
  CPUs which run at the same hardware frequency: 7
  CPUs which need to have their frequency coordinated by software: 7
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 5.61 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 5.61 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.97 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: no

Output from  cpupower frequency-info with guided mode

analyzing CPU 7:
  driver: amd-pstate
  CPUs which run at the same hardware frequency: 7
  CPUs which need to have their frequency coordinated by software: 7
  maximum transition latency: 20.0 us
  hardware limits: 400 MHz - 5.61 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 400 MHz and 5.61 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 1.44 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 214. Maximum Frequency: 5.61 GHz.
    AMD PSTATE Nominal Performance: 145. Nominal Frequency: 3.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 42. Lowest Non-linear Frequency: 1.10 GHz.
    AMD PSTATE Lowest Performance: 16. Lowest Frequency: 400 MHz.

Output from  cpupower frequency-info with passive mode

analyzing CPU 7:
  driver: amd-pstate
  CPUs which run at the same hardware frequency: 7
  CPUs which need to have their frequency coordinated by software: 7
  maximum transition latency: 20.0 us
  hardware limits: 400 MHz - 5.61 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 400 MHz and 5.61 GHz.
                  The governor "schedutil" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.74 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 214. Maximum Frequency: 5.61 GHz.
    AMD PSTATE Nominal Performance: 145. Nominal Frequency: 3.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 42. Lowest Non-linear Frequency: 1.10 GHz.
    AMD PSTATE Lowest Performance: 16. Lowest Frequency: 400 MHz.

Output from /proc/cpuinfo

processor : 7
vendor_id : AuthenticAMD
cpu family : 25
model : 116
model name : AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
stepping : 1
microcode : 0xa704103
cpu MHz : 3800.000
cache size : 1024 KB
physical id : 0
siblings : 8
core id : 7
cpu cores : 8
apicid : 14
initial apicid : 14
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca flush_l1d
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso
bogomips : 7585.46
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] [15]

Status as of April 18, 2024

The problem isn't resolved. I tried both the current HWE and non-HWE kernels with a variety of kernel boot params but the result was that one or both of the new SER7 servers were slower. I tried all of these:
GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm.policy=performance"
GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm=off nosmt amd_pstate=disable"
GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm=off"
GRUB_CMDLINE_LINUX_DEFAULT="amd_pstate=disable"
GRUB_CMDLINE_LINUX_DEFAULT="nosmt"
GRUB_CMDLINE_LINUX_DEFAULT=""

Results from tests are here. In many cases the CPU overhead (user+system) is significantly different on the new servers compared to the old one.

The selection of small servers (with small TDP) that don't have a mix of performance and efficiency cores is limited (the few available use AMD Ryzen CPUs). I might replace the Beelink SER7 with ASUS PN53.

I asked Beelink support for a copy of the v28 BIOS that the old (good) server uses. They provided it, I installed it but the errors remain. 

From dmidecode -t bios the BIOS versions are:
v28 (good one used by old server)
BIOS Information
Vendor: American Megatrends International, LLC.
Version: SER7PRO_P5C8V28
Release Date: 08/14/2023

v38 (used by both new servers that have the errors)
BIOS Information
Vendor: American Megatrends International, LLC.
Version: SER7PRO_P5C8V38
Release Date: 01/10/2024

An example of the errors is:
[Fri Apr 19 17:14:03 2024] pcieport 0000:00:01.2: AER: Corrected error received: 0000:01:00.0
[Fri Apr 19 17:14:03 2024] nvme 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
[Fri Apr 19 17:14:03 2024] nvme 0000:01:00.0:   device [144d:a80a] error status/mask=00000001/0000e000
[Fri Apr 19 17:14:03 2024] nvme 0000:01:00.0:    [ 0] RxErr                  (First)

No comments:

Post a Comment