Author Topic: Data Reliability, RAID, and why you really do need to have a backup  (Read 65490 times)

Eastmarch

  • 1500 Lb Water Buffalo
  • Administrator
  • *****
  • Posts: 328
I wanted to take the time to post this as an overview of data systems reliability along with dispelling some common misconceptions. Considering that we are launching a Data Recovery service in the US, I want to try and convince everyone reading this that having a good backup (and consequently never needing to use said service) is critical. Data recovery almost always costs about 6-10x what a backup would have, and we’d be way happier over here in tech support if we never talked to anyone who had lost all their data without a backup ever again.

In this world nothing can be said to be certain, except death and taxes.
-Benjamin Franklin


All of this is based on one simple fact: There is no such thing as a 100% reliable system.

Seems simple, right? But the consequences of the truth of that statement are huge. Even that triumph of human engineering prowess, the global commercial airline safety record, isn’t 100% safe. It’s something like 99.99999375% non-fatal as of 2017, which is good enough for most people.

But I need to stress that no computer system is anything near that reliable. Reliability numbers for rotational hard drives are very hard to come by, but you should expect to see a 1-3% annual failure rate over the first three years of drive life, with numbers rising thereafter. That means that if you have 4 drives that could fail, you have about a 95.14% chance of going three years without a failure (using a 2% rate).

Now that isn’t bad. But if someone told me there was a 1 in 20 chance of going down in flames on my flight to Cleveland, I’d just drive. And so would everyone else.

Add to that the fact that the computer running those drives has a nonzero chance of catastrophic failure, and the software that runs it also has a nonzero chance of impressively failing in some random and unreproducible way, and you start to see the problem. If your data is irreplaceable, has significant monetary value, or both, then running without a backup of the data is simply an unacceptable strategy.

Three things are certain:
Death, taxes, and lost data.
Guess which has occurred.
- David Dixon, Winner of a haiku error message competition


But what about RAID? Surely if one mirrors the drives, or uses the arcane magic of ‘parity’, or some eldritch combination thereof, then we can relax, sit back, put our feet up on the server rack and enjoy the satisfaction of having protected our precious bits from the ravening wolves of entropy – right?

Yeah, not so much. RAID makes you -tolerant- of -one kind- of failure. It isn’t failure PROOF (again, no such thing as 100% reliability) and will not protect you against

•   Accidental deletion
•   Multi-drive failure (some RAID levels do, but not all)
•   Data Corruption
•   Malicious Software
•   Power Surges
•   Full unit failure
•   Sunspots

Or really anything that isn’t a simple failure on a well-behaved drive that does not choose to rage against the dying of the light, writing corrupted data across your array in the process before succumbing to its inevitable demise. 

RAID’s ONLY purpose is to make it so that when you do lose a drive, and it doesn’t cause other problems, you experience no downtime. Maybe you SLOW down, but you don’t GO down.

Much like having two proverbial eggs in a single proverbial basket doesn’t guarantee you can have a proverbial omelet whenever you proverbially want, RAID doesn’t protect you from (not at all proverbial but very real) data loss.

“If you can’t afford a backup, you can’t afford the data. They should not be seen as separate things.”
-- Me


Quoting yourself is a bit presumptuous, but it makes it no less true. Companies and end users take great care to physically protect their property. They lock their doors, install alarms, badge readers and camera systems, buy insurance policies and think nothing of it. But in nearly all cases, the data on a storage system is of much, much greater value than the system itself! If your only copy of your wedding pictures, or your kid’s childhood is on a $200 hard drive, are they worth more than that? Would your spouse agree? If the drive died tomorrow, would you pay more than that to get them back?

Similarly, if the only copy of all your accounting data for your company is on a $10,000 server, that data is quite simply worth more than every other asset in the company combined. Replacing the hardware might be painful, but replacing the data would be impossible, and likely fatal to the company’s future.

The dichotomy between the value of the machine and the value of the data on the machine is seen nowhere more clearly than when getting a warranty replacement. To be clear, a warranty doesn’t say it WON’T break. It says that if it DOES break, you’ll get a repaired or new one, but your fancy arrangement of bits that make up pictures of cats or mission critical databases isn’t warrantied. So sure, you get a part or unit in the mail with that ‘new-computer’ smell that no one with any geek cred can resist, but that leaves you with the task of flipping all those bits back into the proper arrangement. You remember what they all were, right?

Framed against this background, when I hear statements like, “we can’t afford a UPS”, or “they wouldn’t pay for a backup unit” (whomever ‘they’ are) I wonder if those in charge were aware that they were one system fault away from disaster?

Running a backup is easy. All our NAS devices can do it using an automated process, either to another NAS, or to a USB hard drive. Some of our TeraStations can back up to Dropbox, or Amazon S3, or MS Azure. We have knowledge base articles on how to set any of that stuff up, and if you are still in your support period, we can help you set it up.

In the end, if you are the kind of leather jacket wearing, pack of cigarettes rolled up in your sleeve-type rebel who lives by your own rules and takes it to the limit, you could totally not have a backup. Laughing in the face of danger, you could disdain those who play it safe and dare to live free. Just… try not to get too much sticker shock if you end up needing that recovery after all.


We work hard to provide the best possible chance of data recovery if you need it. Find out more at https://www.buffalotech.com/data-recovery.
« Last Edit: June 07, 2019, 11:38:53 am by Eastmarch »
**A single copy of data, even on a RAID array, is NOT a backup! Hard drive failure is not a question of IF, but WHEN! Don't take my word for it, take Google's!**

acer2974

  • Calf
  • *
  • Posts: 1
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #1 on: January 25, 2021, 01:39:01 pm »
Of course, excellent and all true and you could add that you need at least three copies with one off site somewhere.  This is because there is a human tendency to immediately try backup copies when the main fails for a reason like the data was corrupted due to hardware error such that the second you install the backup copy, it too will be corrupted.  If you have a third or fourth copy nearby, you will immediately install those copies and they will be corrupted and then you are "out of business".  As someone who ran a number of very large data centers (several football fields each), I always insisted that we have a backup copy of critical data in an offsite location (that was not convenient, like maybe Iron Mountain) and the offsite facility would not release anything without my signature and a phone check. and then my people would have to explain why this last copy will not be corrupted like the others.  If you think this procedure is overdone ("The old man is nuts"), it's not and take the word of someone who has been in the biz for fifty years!  (I probably have socks older than most on this forum).  No one ever died saying that they wished they didn't make so many backup copies (or ate so much chocolate cake).

Overall, I think it can also be stated that RAID can only (maybe) protect against drive failure but not against hardware faults which prevent access or actually corrupt the data or kill the NAS.  You cannot trust that if you pull out the drives, they will work in another NAS.  At home, I have two identical 5400's with a duplicate RAID on each with each 5400 powered by separate electric panels and UPS's.   (I have a 400 amp service using two standard 200 amp panels).  I resync the two Raids in real time using software on my pc which detects any change to the files.   I also make a third bulk copy maybe one a week. I wish there was a better way to have real time sync between two NAS's.  Maybe there is!  Please advise!  Appreciate any ideas or arguments!  (my wife says I am always wrong about everything)

Lastly, nothing will prevent stupid mistakes...  The only thing we are arguing about is when you will lose you data, not if.  Thanks...

dumbuser

  • Calf
  • *
  • Posts: 3
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #2 on: January 27, 2021, 01:17:51 pm »
Your post is extremely timely, at least to me.  I suffered a problem with my LS220D a couple of weeks ago and now, I fear, my data is lost forever - I did not have a backup.  Not only did your post address backup, you also talked about a new venture for Buffalo - data recovery. 
My NAS just went off into never-never land and is not addressable by my LAN.  It can't be turned off other than by pulling the cord.  I was running RAID 0.  I have pulled the drives and tried to read them from my Windows 10 PC but Windows wants to re-format the drives.  Is there any way that the data can be recovered?

1000001101000

  • Debian Wizard
  • Big Bull
  • *****
  • Posts: 929
  • There's no problem so bad you cannot make it worse
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #3 on: January 27, 2021, 01:35:21 pm »
You can try connecting them to a linux system. Your data is stored within an XFS filesystem inside an software raid (mdadm) array. Depending on the problem it's possible that the data array is still intact.

sethman

  • Calf
  • *
  • Posts: 1
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #4 on: January 27, 2021, 06:14:32 pm »
I have only superficial tech knowledge. I just received my Buffalo LS220 4TB. I set it up in about 20 minutes. I wanted to make sure a RAID was enabled and so I followed the instructions to set one up and saw that it was already there. How do I know my RAID is working? Following the instructions in set up I have access to one drive. I must assume the other drive is a mirror, but can I check it?

Thanks for explaining to me,
sethman

Rich_Morin

  • Calf
  • *
  • Posts: 3
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #5 on: February 19, 2021, 02:25:43 pm »
... you need at least three copies with one off site somewhere.  This is because there is a human tendency to immediately try backup copies when the main fails for a reason like the data was corrupted due to hardware error such that the second you install the backup copy, it too will be corrupted.

Yes, and there are other reasons, as well.  For example, a friend of mine was (barely) able to grab a hard drive on his way out of a burning house (in Santa Rosa, CA USA) a few years ago.  Without that, he would have lost all his bits. Copying files to an offsite backup has the advantage that the backup machine (e.g., a cloud server farm) is unlikely to fall prey to the same disaster that takes out your system.  And if it does, you have much bigger problems than a backup will solve (:-).

That said, the kind of human error mentioned above is really the biggest threat, for many folks.  So, my plan is to use two RAID arrays and run Time Machine on both of them. After that, I'll find another system (eg, at my brother's house) where I can store a third RAID array.  One advantage of Time Machine in this scenario is that it would only have to transmit changes.  Git could be used in a similar manner...  FYI, I used to have a system where I'd hand a friend an 8mm tape snapshot whenever I saw him; I told him to stuff it somewhere in a closet or drawer; security through obscurity, FTW...

-r