Saturday, October 21, 2017

Wearing out an SSD

I use Intel NUC servers at home to test open-source databases for performance and efficiency. The servers have an SSD for the database and that is either a Samsung 960 EVO or a Samsung 850 EVO. It is time to replace the 960 EVO after about 5 months of heavy usage. I test MySQL (MyRocks, InnoDB, MyISAM) and MongoDB (MongoRocks, WiredTiger and mmapv1). If I limited myself to MyRocks and MongoRocks then the storage devices would last much longer courtesy of better write efficiency of an LSM versus a B-Tree.

I have 3 servers with the 960 EVO and I will replace the SSD in all of them at the same time. I assume that device performance changes as it ages, but I have never tried to quantify that. For the 850 EVO I will buy extra spares and will upgrade from the 120gb device to a 250gb device because they cost the same and the 120gb device is hard to find. I just hope the devices I use right now will last long enough to finish my current round of testing. One day I will switch to EC2 and GCE and wear out their devices, but I like the predictability I get from my home servers.

I use Ubuntu 16.04 and its version of smartctl doesn't yet support NVMe devices so I used the nvme utility. Percona has a useful blog post on this. The percentage_used value is 250% which means the estimated device endurance has been greatly exceeded. The value of critical_warning is 0x4 which means NVM subsystem reliability has been degraded due to significant media related errors or any internal error that degrades NVM subsystem reliability per the NVMe spec. The data_units_written value is the number of 512 bytes units written and is reported in thousands. The value 1,400,550,163 means that 652TB has been written to the device. The device is 250GB which is about 2700 full device writes. If I wave my hands and expect 2000 full device writes from 3D NAND and ignore overprovisioning (OP) then it seems reasonable that the device is done. I assume that OP is 10% based on available_spare_threshold. The warranty on the 250gb 960 EVO is 3 years or 100 TBW and I wrote 652TB so I am happy about that. My previous post on this is here.

This is from the 960 EVO.

$ sudo nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning                    : 0x4
temperature                         : 32 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 250%
data_units_read                     : 159,094,604
data_units_written                  : 1,400,550,163
host_read_commands                  : 4,698,541,096
host_write_commands                 : 19,986,018,997
controller_busy_time                : 32,775
power_cycles                        : 30
power_on_hours                      : 3,039
unsafe_shutdowns                    : 7
media_errors                        : 0
num_err_log_entries                 : 0
Warning Temperature Time            : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1                : 32 C
Temperature Sensor 2                : 45 C
Temperature Sensor 3                : 0 C
Temperature Sensor 4                : 0 C
Temperature Sensor 5                : 0 C
Temperature Sensor 6                : 0 C
Temperature Sensor 7                : 0 C
Temperature Sensor 8                : 0 C

This is from the 850 EVO and the server with the largest value for Total_LBAs_Written. The device has a 512 byte sector and with Total_LBAs_Written at 739353756925 then 344TB has been written to the 120gb device. That is about 2900 full device writes assuming no OP. Once again, I should be happy that the device lasted this long. The warranty on the 120gb 850 EVO is 5 years or 75TBW. I wrote a lot more than 75TB. The Wear_Leveling_Count value is 3335 and that is the average number of P/E cycles. That value is similar to my estimate of 2900 full device writes. I assume that I will get about 2000 from 3D NAND and I exceeded that.

$ sudo smartctl --all /dev/sdb
...

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4430
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       11
177 Wear_Leveling_Count     0x0013   001   001   000    Pre-fail  Always       -       3335
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   047   038   000    Old_age   Always       -       53
195 Hardware_ECC_Recovered  0x001a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
235 Unknown_Attribute       0x0012   099   099   000    Old_age   Always       -       2
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       739353756925

SMART Error Log Version: 1
No Errors Logged

RocksDB on a big server: LRU vs hyperclock

This has benchmark results for RocksDB using a big (48-core) server. I ran tests to document the impact of the the block cache type (LRU vs ...