Author Topic: Data Reliability, RAID, and why you really do need to have a backup  (Read 291 times)

Eastmarch

  • 1500 Lb Water Buffalo
  • Administrator
  • *****
  • Posts: 285
I wanted to take the time to post this as an overview of data systems reliability along with dispelling some common misconceptions. Considering that we are launching a Data Recovery service in the US, I want to try and convince everyone reading this that having a good backup (and consequently never needing to use said service) is critical. Data recovery almost always costs about 6-10x what a backup would have, and we’d be way happier over here in tech support if we never talked to anyone who had lost all their data without a backup ever again.

In this world nothing can be said to be certain, except death and taxes.
-Benjamin Franklin


All of this is based on one simple fact: There is no such thing as a 100% reliable system.

Seems simple, right? But the consequences of the truth of that statement are huge. Even that triumph of human engineering prowess, the global commercial airline safety record, isn’t 100% safe. It’s something like 99.99999375% non-fatal as of 2017, which is good enough for most people.

But I need to stress that no computer system is anything near that reliable. Reliability numbers for rotational hard drives are very hard to come by, but you should expect to see a 1-3% annual failure rate over the first three years of drive life, with numbers rising thereafter. That means that if you have 4 drives that could fail, you have about a 95.14% chance of going three years without a failure (using a 2% rate).

Now that isn’t bad. But if someone told me there was a 1 in 20 chance of going down in flames on my flight to Cleveland, I’d just drive. And so would everyone else.

Add to that the fact that the computer running those drives has a nonzero chance of catastrophic failure, and the software that runs it also has a nonzero chance of impressively failing in some random and unreproducible way, and you start to see the problem. If your data is irreplaceable, has significant monetary value, or both, then running without a backup of the data is simply an unacceptable strategy.

Three things are certain:
Death, taxes, and lost data.
Guess which has occurred.
- David Dixon, Winner of a haiku error message competition


But what about RAID? Surely if one mirrors the drives, or uses the arcane magic of ‘parity’, or some eldritch combination thereof, then we can relax, sit back, put our feet up on the server rack and enjoy the satisfaction of having protected our precious bits from the ravening wolves of entropy – right?

Yeah, not so much. RAID makes you -tolerant- of -one kind- of failure. It isn’t failure PROOF (again, no such thing as 100% reliability) and will not protect you against

•   Accidental deletion
•   Multi-drive failure (some RAID levels do, but not all)
•   Data Corruption
•   Malicious Software
•   Power Surges
•   Full unit failure
•   Sunspots

Or really anything that isn’t a simple failure on a well-behaved drive that does not choose to rage against the dying of the light, writing corrupted data across your array in the process before succumbing to its inevitable demise. 

RAID’s ONLY purpose is to make it so that when you do lose a drive, and it doesn’t cause other problems, you experience no downtime. Maybe you SLOW down, but you don’t GO down.

Much like having two proverbial eggs in a single proverbial basket doesn’t guarantee you can have a proverbial omelet whenever you proverbially want, RAID doesn’t protect you from (not at all proverbial but very real) data loss.

“If you can’t afford a backup, you can’t afford the data. They should not be seen as separate things.”
-- Me


Quoting yourself is a bit presumptuous, but it makes it no less true. Companies and end users take great care to physically protect their property. They lock their doors, install alarms, badge readers and camera systems, buy insurance policies and think nothing of it. But in nearly all cases, the data on a storage system is of much, much greater value than the system itself! If your only copy of your wedding pictures, or your kid’s childhood is on a $200 hard drive, are they worth more than that? Would your spouse agree? If the drive died tomorrow, would you pay more than that to get them back?

Similarly, if the only copy of all your accounting data for your company is on a $10,000 server, that data is quite simply worth more than every other asset in the company combined. Replacing the hardware might be painful, but replacing the data would be impossible, and likely fatal to the company’s future.

The dichotomy between the value of the machine and the value of the data on the machine is seen nowhere more clearly than when getting a warranty replacement. To be clear, a warranty doesn’t say it WON’T break. It says that if it DOES break, you’ll get a repaired or new one, but your fancy arrangement of bits that make up pictures of cats or mission critical databases isn’t warrantied. So sure, you get a part or unit in the mail with that ‘new-computer’ smell that no one with any geek cred can resist, but that leaves you with the task of flipping all those bits back into the proper arrangement. You remember what they all were, right?

Framed against this background, when I hear statements like, “we can’t afford a UPS”, or “they wouldn’t pay for a backup unit” (whomever ‘they’ are) I wonder if those in charge were aware that they were one system fault away from disaster?

Running a backup is easy. All our NAS devices can do it using an automated process, either to another NAS, or to a USB hard drive. Some of our TeraStations can back up to Dropbox, or Amazon S3, or MS Azure. We have knowledge base articles on how to set any of that stuff up, and if you are still in your support period, we can help you set it up.

In the end, if you are the kind of leather jacket wearing, pack of cigarettes rolled up in your sleeve-type rebel who lives by your own rules and takes it to the limit, you could totally not have a backup. Laughing in the face of danger, you could disdain those who play it safe and dare to live free. Just… try not to get too much sticker shock if you end up needing that recovery after all.

**A single copy of data, even on a RAID array, is NOT a backup! Hard drive failure is not a question of IF, but WHEN! Don't take my word for it, take Google's!**