At the end of 2019, I built a new FreeNAS server, which was the subject of many blogs at the time. It’s been running well ever since until I received an e-mail notification earlier this week to say that FreeNAS0 had suffered a HDD failure!
When you have a storage server containing multiple drives you have to accept that at some point you’ll have a HDD failure. That’s the whole point of a redundant storage array, right? FreeNAS0 had 8 Western Digital 8TB drives that were removed from My Book external enclosures due to the huge cost saving over the same external drives.
They are running in a RAIDZ2 configuration, which is similar to a RAID6 storage array. This is defined on Wikipedia as “a form of RAID that can continue to execute read and write requests to all of a RAID array’s virtual disks in the presence of any two concurrent disk failures”. So a single drive failure is nothing to panic about, right?
Well, anything that means moving, opening and/or replacing something in my FreeNAS server always causes a minor panic, but the answer to the question is ‘of course’. Regardless of how many parity disks I have, I also have a second FreeNAS server which receives a nightly backup of FreeNAS0. And my backup strategy doesn’t stop there!
I also have a ubuntu VM mounted to the really important data on FreeNAS0 (like my pictures, music, documents, etc.) which is connected to Crashplan and stores about 3TB of irreplaceable data in the cloud. So the only real loss would be time recovering or restoring data if something catastrophic happened to FreeNAS0.
Fortunately, things could not really have been much simpler. I’d bought 9 drives when I built FreeNAS0 so I had a cold-spare in the garage. The first job was to remove this from the enclosure, trying really hard not to break anything so I could put the failed drive back in and return to WD under warranty. There are a load of videos online that show how to do this, and it’s pretty straight forward if you take your time.
Next, I shut FreeNAS0 down and unplugged everything. Identifying the failed drive was pretty simple as the serial numbers are printed on the visible edge of the drives. It was the 2nd down from the top, so I disconnected the power and data cables and then slid out the drive cage. 4 screwed released it, before fastening the new drive in and reversing things to connect everything back up.
I also had to use the Kapson electrical tape to cover the 3rd pin, which I covered in this blog, but when I fired FreeNAS0 back up and waited for everything to come back online, the drive was showing as removed and I simply replaced with the new da1 from the Web UI. It came straight back online, but it then has to resilver the drive so it contains the missing data. When it started, this was reporting that it would take over 3 days, but in reality, it was complete in less than 24 hours.
And everything is back to how it was before I saw the e-mail alert. Well almost. I still have to RMA the faulty drive, which I managed to slot back into the enclosure using the correct insert and box matching the serial number on the drive. I’d kept them all in the garage, so the only challenge was finding the right one! That’s all packaged up and ready to return, so hopefully, I’ll get another one from WD which will go back into the box as a cold-spare for the next failure.
Fingers crossed that’s at least another 12 months away…