I use Intel NUC servers at home to test open-source databases for performance and efficiency. The servers have an SSD for the database and that is either a Samsung 960 EVO or a Samsung 850 EVO. It is time to replace the 960 EVO after about 5 months of heavy usage. I test MySQL (MyRocks, InnoDB, MyISAM) and MongoDB (MongoRocks, WiredTiger and mmapv1). If I limited myself to MyRocks and MongoRocks then the storage devices would last much longer courtesy of better write efficiency of an LSM versus a B-Tree.
I have 3 servers with the 960 EVO and I will replace the SSD in all of them at the same time. I assume that device performance changes as it ages, but I have never tried to quantify that. For the 850 EVO I will buy extra spares and will upgrade from the 120gb device to a 250gb device because they cost the same and the 120gb device is hard to find. I just hope the devices I use right now will last long enough to finish my current round of testing. One day I will switch to EC2 and GCE and wear out their devices, but I like the predictability I get from my home servers.
I use Ubuntu 16.04 and its version of smartctl doesn't yet support NVMe devices so I used the nvme utility. Percona has a useful blog post on this. The percentage_used value is 250% which means the estimated device endurance has been greatly exceeded. The value of critical_warning is 0x4 which means NVM subsystem reliability has been degraded due to significant media related errors or any internal error that degrades NVM subsystem reliability per the NVMe spec. The data_units_written value is the number of 512 bytes units written and is reported in thousands. The value 1,400,550,163 means that 652TB has been written to the device. The device is 250GB which is about 2700 full device writes. If I wave my hands and expect 2000 full device writes from 3D NAND and ignore overprovisioning (OP) then it seems reasonable that the device is done. I assume that OP is 10% based on available_spare_threshold. The warranty on the 250gb 960 EVO is 3 years or 100 TBW and I wrote 652TB so I am happy about that. My previous post on this is here.
This is from the 960 EVO.
$ sudo nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0x4
temperature : 32 C
available_spare : 100%
available_spare_threshold : 10%
percentage_used : 250%
data_units_read : 159,094,604
data_units_written : 1,400,550,163
host_read_commands : 4,698,541,096
host_write_commands : 19,986,018,997
controller_busy_time : 32,775
power_cycles : 30
power_on_hours : 3,039
unsafe_shutdowns : 7
media_errors : 0
num_err_log_entries : 0
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 32 C
Temperature Sensor 2 : 45 C
Temperature Sensor 3 : 0 C
Temperature Sensor 4 : 0 C
Temperature Sensor 5 : 0 C
Temperature Sensor 6 : 0 C
Temperature Sensor 7 : 0 C
Temperature Sensor 8 : 0 C
This is from the 850 EVO and the server with the largest value for Total_LBAs_Written. The device has a 512 byte sector and with Total_LBAs_Written at 739353756925 then 344TB has been written to the 120gb device. That is about 2900 full device writes assuming no OP. Once again, I should be happy that the device lasted this long. The warranty on the 120gb 850 EVO is 5 years or 75TBW. I wrote a lot more than 75TB. The Wear_Leveling_Count value is 3335 and that is the average number of P/E cycles. That value is similar to my estimate of 2900 full device writes. I assume that I will get about 2000 from 3D NAND and I exceeded that.
$ sudo smartctl --all /dev/sdb
...
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 4430
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 11
177 Wear_Leveling_Count 0x0013 001 001 000 Pre-fail Always - 3335
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 047 038 000 Old_age Always - 53
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 2
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 739353756925
SMART Error Log Version: 1
No Errors Logged
Subscribe to:
Post Comments (Atom)
RocksDB on a big server: LRU vs hyperclock, v2
This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a lar...
-
This provides additional results for Postgres versions 11 through 16 vs Sysbench on a medium server. My previous post is here . The goal is ...
-
I often use HWE kernels with Ubuntu and currently use Ubuntu 22.04. Until recently that meant I ran Linux 6.2 but after a recent update I am...
-
I am trying out a dedicated server from Hetzner for my performance work. I am trying the ax162-s that has 48 cores (96 vCPU), 128G of RAM a...
Great! Wearing out the SSD is always happy.
ReplyDeleteWhile I read this post, I realized I forgot to check SSD's internal values using smartctl. I should check it right now.
I am also considering changing the experimental environment to EC2 or GCE. But, as you said, predictability is important, and I am worried if it costs too much.
Are there are "visible" effects from this wearing out yet? available_spare seems to indicate 100%, which suggests to me that the device itself still thinks it is working? If not: I'm impressed that it still seems to function without errors!
ReplyDeleteI still have the old devices. Maybe I will run IO benchmarks with fio to find out whether there is a difference, or just repeat my mysql tests.
DeleteHi Mark,
ReplyDeleteThanks for sharing the data you have. It is very interesting indeed Flash devices tend to be able to take much more writes than the rated endurance. Though performance may degrade due to extra error correction required and block relocation.
One thing which is also affected by wear is how much SSD will hold data with power off - often enough such old SSD might loose the data if you shut server off for a few days.
That would be a great experiment for someone to validate.
DeleteNice Post
ReplyDeleteGreat insights on SSD longevity! I've been using the Samsung PM1743, and its durability is impressive. Thanks for the valuable information!
ReplyDelete