Over the last couple of days, there have been a small but growing number of errors reported on disk one of my NAS (RAID 1, so mirrored). I wasn’t too concerned, as it didn’t look calamitous. I had a spare disk somewhere, I would fix it when I had a moment. Meanwhile the NAS device worked around the errors.
Last night, the RAID simply shut down, and it wouldn’t come up again this morning. I unplugged the faulty drive, and tried starting the NAS with the single good drive, nothing.
This didn’t hold me up, at all, as I keep the same data backed up to yet another drive; so it was just a matter of pointing my mapped drives elsewhere. But at moments during the day, I would try something on the old NAS, and still fail.
Then tonight, I put both disks back in, turned it on, and looked at the error lights that were flashing. I had been wrong, it wasn’t disk one, with the errors, that had failed, but disk two – totally, utterly dead.
I then restarted the NAS with disk one in place, and not disk two, and it all worked, straight out. A very quick copy of the few bits of data that hadn’t made the last backup means that *all* my data is safe. A new drive has replaced disk two, and the NAS is resyncing. Sometime tomorrow, when this is done, I will replace the errored disk one with another new drive, and let that sync from the replaced disk two.
What we learn from this:
- Always have a backup. If you can have more than one backup, that’s even better.
- When something goes wrong, look, and look again, before rushing into action. The problem might not be the “obvious” thing you think it is.
- When you decide a disk needs replacing, replace it then. Don’t think “oh, it will probably be ok for another day or two”.