Each NUC has one SATA disk and one SSD. Most tests use the SSD because the disk has the OS install and I don't want to lose the install when too much testing makes the disk unhappy. My current SSD is Samsung 850 EVO with 120G and one of these became sick.
[3062127.595842] attempt to access beyond end of device
[3062127.595847] sdb1: rw=129, want=230697888, limit=230686720
[3062127.595850] XFS (sdb1): discard failed for extent [0x7200223,8192], error 5
Other error messages were amusing.
[2273399.254789] Uhhuh. NMI received for unknown reason 3d on CPU 3.
[2273399.254818] Do you have a strange power saving mode enabled?
[2273399.254840] Dazed and confused, but trying to continue
What does smartctl say? I am interested in Wear_Leveling_Count. The raw value is 1656. If that means what I think it means then this device can go to 2000 thanks to 3D TLC NAND (aka 3D V-NAND). The VALUE is 022 and that counts down from 100 to 0 so this device is 80% done and Wear_Leveling_Count might reach 2000. I created a new XFS filesystem on the device, rebooted the server and restarted my test. I don't think I need to replace this SSD today.
sudo smartctl -a /dev/sdb1
...
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 4323
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 42
177 Wear_Leveling_Count 0x0013 022 022 000 Pre-fail Always - 1656
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 055 049 000 Old_age Always - 45
195 Hardware_ECC_Recovered 0x001a 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 Unknown_Attribute 0x0012 099 099 000 Old_age Always - 9
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 365781411804
Mark,
ReplyDeleteUsually I see drives perform for quite a while after wear level indicator hits zero. I also have bunch of NUCs with inexpensive SSDs I use for testing :)
Does performance get a lot worse as the device reaches its advertised limit?
DeleteFrankly I did not measure. What physically would you expect to cause slowdown unless there are correctable errors which require remapping ?
ReplyDeleteWould be great if vendors were willing to make a statement about that
DeleteJust read about SSD's here (there are more articles about this stresstest):
ReplyDeletehttp://techreport.com/review/27909/the-ssd-endurance-experiment-theyre-all-dead
Those are few generations behind, newer discs are better and last longer. But still, this kind of memory has it's internal limitations and will break at some point.