Author Topic: Data Reliability, RAID, and why you really do need to have a backup  (Read 101060 times)

Eastmarch

  • 1500 Lb Water Buffalo
  • Administrator
  • *****
  • Posts: 339
I wanted to take the time to post this as an overview of data systems reliability along with dispelling some common misconceptions. Considering that we are launching a Data Recovery service in the US, I want to try and convince everyone reading this that having a good backup (and consequently never needing to use said service) is critical. Data recovery almost always costs about 6-10x what a backup would have, and we’d be way happier over here in tech support if we never talked to anyone who had lost all their data without a backup ever again.

In this world nothing can be said to be certain, except death and taxes.
-Benjamin Franklin


All of this is based on one simple fact: There is no such thing as a 100% reliable system.

Seems simple, right? But the consequences of the truth of that statement are huge. Even that triumph of human engineering prowess, the global commercial airline safety record, isn’t 100% safe. It’s something like 99.99999375% non-fatal as of 2017, which is good enough for most people.

But I need to stress that no computer system is anything near that reliable. Reliability numbers for rotational hard drives are very hard to come by, but you should expect to see a 1-3% annual failure rate over the first three years of drive life, with numbers rising thereafter. That means that if you have 4 drives that could fail, you have about a 95.14% chance of going three years without a failure (using a 2% rate).

Now that isn’t bad. But if someone told me there was a 1 in 20 chance of going down in flames on my flight to Cleveland, I’d just drive. And so would everyone else.

Add to that the fact that the computer running those drives has a nonzero chance of catastrophic failure, and the software that runs it also has a nonzero chance of impressively failing in some random and unreproducible way, and you start to see the problem. If your data is irreplaceable, has significant monetary value, or both, then running without a backup of the data is simply an unacceptable strategy.

Three things are certain:
Death, taxes, and lost data.
Guess which has occurred.
- David Dixon, Winner of a haiku error message competition


But what about RAID? Surely if one mirrors the drives, or uses the arcane magic of ‘parity’, or some eldritch combination thereof, then we can relax, sit back, put our feet up on the server rack and enjoy the satisfaction of having protected our precious bits from the ravening wolves of entropy – right?

Yeah, not so much. RAID makes you -tolerant- of -one kind- of failure. It isn’t failure PROOF (again, no such thing as 100% reliability) and will not protect you against

•   Accidental deletion
•   Multi-drive failure (some RAID levels do, but not all)
•   Data Corruption
•   Malicious Software
•   Power Surges
•   Full unit failure
•   Sunspots

Or really anything that isn’t a simple failure on a well-behaved drive that does not choose to rage against the dying of the light, writing corrupted data across your array in the process before succumbing to its inevitable demise. 

RAID’s ONLY purpose is to make it so that when you do lose a drive, and it doesn’t cause other problems, you experience no downtime. Maybe you SLOW down, but you don’t GO down.

Much like having two proverbial eggs in a single proverbial basket doesn’t guarantee you can have a proverbial omelet whenever you proverbially want, RAID doesn’t protect you from (not at all proverbial but very real) data loss.

“If you can’t afford a backup, you can’t afford the data. They should not be seen as separate things.”
-- Me


Quoting yourself is a bit presumptuous, but it makes it no less true. Companies and end users take great care to physically protect their property. They lock their doors, install alarms, badge readers and camera systems, buy insurance policies and think nothing of it. But in nearly all cases, the data on a storage system is of much, much greater value than the system itself! If your only copy of your wedding pictures, or your kid’s childhood is on a $200 hard drive, are they worth more than that? Would your spouse agree? If the drive died tomorrow, would you pay more than that to get them back?

Similarly, if the only copy of all your accounting data for your company is on a $10,000 server, that data is quite simply worth more than every other asset in the company combined. Replacing the hardware might be painful, but replacing the data would be impossible, and likely fatal to the company’s future.

The dichotomy between the value of the machine and the value of the data on the machine is seen nowhere more clearly than when getting a warranty replacement. To be clear, a warranty doesn’t say it WON’T break. It says that if it DOES break, you’ll get a repaired or new one, but your fancy arrangement of bits that make up pictures of cats or mission critical databases isn’t warrantied. So sure, you get a part or unit in the mail with that ‘new-computer’ smell that no one with any geek cred can resist, but that leaves you with the task of flipping all those bits back into the proper arrangement. You remember what they all were, right?

Framed against this background, when I hear statements like, “we can’t afford a UPS”, or “they wouldn’t pay for a backup unit” (whomever ‘they’ are) I wonder if those in charge were aware that they were one system fault away from disaster?

Running a backup is easy. All our NAS devices can do it using an automated process, either to another NAS, or to a USB hard drive. Some of our TeraStations can back up to Dropbox, or Amazon S3, or MS Azure. We have knowledge base articles on how to set any of that stuff up, and if you are still in your support period, we can help you set it up.

In the end, if you are the kind of leather jacket wearing, pack of cigarettes rolled up in your sleeve-type rebel who lives by your own rules and takes it to the limit, you could totally not have a backup. Laughing in the face of danger, you could disdain those who play it safe and dare to live free. Just… try not to get too much sticker shock if you end up needing that recovery after all.


We work hard to provide the best possible chance of data recovery if you need it. Find out more at https://www.buffalotech.com/data-recovery.
« Last Edit: June 07, 2019, 11:38:53 AM by Eastmarch »
**A single copy of data, even on a RAID array, is NOT a backup! Hard drive failure is not a question of IF, but WHEN! Don't take my word for it, take Google's!**

acer2974

  • Calf
  • *
  • Posts: 1
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #1 on: January 25, 2021, 01:39:01 PM »
Of course, excellent and all true and you could add that you need at least three copies with one off site somewhere.  This is because there is a human tendency to immediately try backup copies when the main fails for a reason like the data was corrupted due to hardware error such that the second you install the backup copy, it too will be corrupted.  If you have a third or fourth copy nearby, you will immediately install those copies and they will be corrupted and then you are "out of business".  As someone who ran a number of very large data centers (several football fields each), I always insisted that we have a backup copy of critical data in an offsite location (that was not convenient, like maybe Iron Mountain) and the offsite facility would not release anything without my signature and a phone check. and then my people would have to explain why this last copy will not be corrupted like the others.  If you think this procedure is overdone ("The old man is nuts"), it's not and take the word of someone who has been in the biz for fifty years!  (I probably have socks older than most on this forum).  No one ever died saying that they wished they didn't make so many backup copies (or ate so much chocolate cake).

Overall, I think it can also be stated that RAID can only (maybe) protect against drive failure but not against hardware faults which prevent access or actually corrupt the data or kill the NAS.  You cannot trust that if you pull out the drives, they will work in another NAS.  At home, I have two identical 5400's with a duplicate RAID on each with each 5400 powered by separate electric panels and UPS's.   (I have a 400 amp service using two standard 200 amp panels).  I resync the two Raids in real time using software on my pc which detects any change to the files.   I also make a third bulk copy maybe one a week. I wish there was a better way to have real time sync between two NAS's.  Maybe there is!  Please advise!  Appreciate any ideas or arguments!  (my wife says I am always wrong about everything)

Lastly, nothing will prevent stupid mistakes...  The only thing we are arguing about is when you will lose you data, not if.  Thanks...

dumbuser

  • Calf
  • *
  • Posts: 3
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #2 on: January 27, 2021, 01:17:51 PM »
Your post is extremely timely, at least to me.  I suffered a problem with my LS220D a couple of weeks ago and now, I fear, my data is lost forever - I did not have a backup.  Not only did your post address backup, you also talked about a new venture for Buffalo - data recovery. 
My NAS just went off into never-never land and is not addressable by my LAN.  It can't be turned off other than by pulling the cord.  I was running RAID 0.  I have pulled the drives and tried to read them from my Windows 10 PC but Windows wants to re-format the drives.  Is there any way that the data can be recovered?

1000001101000

  • Debian Wizard
  • Big Bull
  • *****
  • Posts: 1128
  • There's no problem so bad you cannot make it worse
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #3 on: January 27, 2021, 01:35:21 PM »
You can try connecting them to a linux system. Your data is stored within an XFS filesystem inside an software raid (mdadm) array. Depending on the problem it's possible that the data array is still intact.

sethman

  • Calf
  • *
  • Posts: 1
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #4 on: January 27, 2021, 06:14:32 PM »
I have only superficial tech knowledge. I just received my Buffalo LS220 4TB. I set it up in about 20 minutes. I wanted to make sure a RAID was enabled and so I followed the instructions to set one up and saw that it was already there. How do I know my RAID is working? Following the instructions in set up I have access to one drive. I must assume the other drive is a mirror, but can I check it?

Thanks for explaining to me,
sethman

Rich_Morin

  • Calf
  • *
  • Posts: 3
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #5 on: February 19, 2021, 02:25:43 PM »
... you need at least three copies with one off site somewhere.  This is because there is a human tendency to immediately try backup copies when the main fails for a reason like the data was corrupted due to hardware error such that the second you install the backup copy, it too will be corrupted.

Yes, and there are other reasons, as well.  For example, a friend of mine was (barely) able to grab a hard drive on his way out of a burning house (in Santa Rosa, CA USA) a few years ago.  Without that, he would have lost all his bits. Copying files to an offsite backup has the advantage that the backup machine (e.g., a cloud server farm) is unlikely to fall prey to the same disaster that takes out your system.  And if it does, you have much bigger problems than a backup will solve (:-).

That said, the kind of human error mentioned above is really the biggest threat, for many folks.  So, my plan is to use two RAID arrays and run Time Machine on both of them. After that, I'll find another system (eg, at my brother's house) where I can store a third RAID array.  One advantage of Time Machine in this scenario is that it would only have to transmit changes.  Git could be used in a similar manner...  FYI, I used to have a system where I'd hand a friend an 8mm tape snapshot whenever I saw him; I told him to stuff it somewhere in a closet or drawer; security through obscurity, FTW...

-r

ImaginaryTango

  • Calf
  • *
  • Posts: 7
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #6 on: February 26, 2022, 03:02:14 AM »
I've always believed in multiple backups. I've had times my main drives were a RAID (that was for a business I ran for a while). Other times I backup my data from my computers to a RAID. I prefer 2 local and one cloud backup. Now I live on a large lot with a guest house a few hundred feet away. That lets me put one of my local backups in the house and one in the guest house, which changes the risk factor somewhat.

What really frustrates me and pisses me off is that we don't have regular internet here. We're in a 1 mile stretch where Comcrap and Verizon won't extend to cover us because of the low density. They make so much off the customers in the county you'd think that the county, if they cared, could require them to spread out and cover all last mile customers over a few years. Nope. We're due for Starlink in April. When we get it, once I've verified it's working, the first think I'll be doing is a massive update of EVERYTHING on my Amazon cloud account. (For now, due to bandwidth issues, I can only move some things there.)

I feel like I've just walked into my final exam, without studying, and in underwear without the ability to do remote backups.

Arsh_gabbi

  • Calf
  • *
  • Posts: 2
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #7 on: March 02, 2022, 01:20:46 PM »
thanks for the information

oliverb

  • Calf
  • *
  • Posts: 14
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #8 on: June 01, 2022, 07:33:11 AM »
I'm surprised no-one has pointed out that RAID 0 is just splitting the data between the two drives, meaning that the failure of any one drive could leave the data unreadable, or at least difficult and expensive to recover. Generally to read the data of a RAID 0 array you need to put the two drives into the same type of controller with the same configuration, however if that controller happens to be a NAS that refuses to boot then you have a problem.

RAID 1 is probably easiest to recover from, the two drives should be identical. More importantly the content of either drive by itself ought to be readable as a normal "volume", also the drive sequence shouldn't matter. I may not have that 100% right though, it depends on what kind of partition table the NAS creates. Obviously the downside is you have only 50% of the capacity compared to unmirrored.

RAID 5 looks like a good compromise because only maybe 20-25% of storage is lost to parity, and the array should withstand the loss of one drive, however in the degraded state (one drive down) the array is in a wierd state where a further failure could render it nearly unrecoverable, and a "rebuild" puts the array under significant strain meaning there is a higher than normal risk of failure.

With RAID 5 you should probably make precautionary drive swaps to make sure they don't all wear out simultaneously, and also plan on the basis that one day it will fail hard. By "fail hard" I mean getting into a state where you need to ask if it is better to just restore a backup onto new drives rather than spend the time/money trying to recover the array. This goes doubly so for "array of array" configurations.

Regardless of the technology you need to consider the possibility of the array undergoing "critical existance failure" and becoming completely unavailable to you. Aside from fire and theft there's the possibility that a power supply might fail overvoltage and brick anything it is connected to. An "encrypted drive" configuration might lose the encryption keys, effectively instantly reformatting itself.

Even if the array is perfectly intact it is possible that ransomware on a connected machine might quietly encrypt the NAS contents.
« Last Edit: July 31, 2022, 04:51:53 AM by oliverb »

Eastmarch

  • 1500 Lb Water Buffalo
  • Administrator
  • *****
  • Posts: 339
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #9 on: July 26, 2022, 01:19:38 PM »
Yeah, RAID 0 means Zero redundancy. It's actually -less- safe than a single drive because you have two points of failure. If you have some unresolved issues with your data, and secretly wish it harm, this is a good choice. Otherwise, don't do it.

For awhile there back when people were less familiar with the idea of RAID, we had customers complain pretty often that we had cheated them out of half their storage space. They were in line behind the people who were upset that their 2TB hard drive only had 1.953125 TB.
**A single copy of data, even on a RAID array, is NOT a backup! Hard drive failure is not a question of IF, but WHEN! Don't take my word for it, take Google's!**

juanbennet34

  • Calf
  • *
  • Posts: 1
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #10 on: August 29, 2022, 01:18:04 AM »
There are several reason
Data Relibality it is an essential base for establishing data trust within the organisation. One of the key goals of data integrity programmes, which are also used to uphold data security, data quality, and regulatory compliance, is to ensure data reliability.
RAID arrays can improve data safety, but the extra discs they contain shouldn't be viewed as backups. You must still back up your primary disc even if it is part of a RAID array.
You should back it up to another device if, for example, your RAID array has 12 TB of storage. This may be, for instance, two ordinary 8 TB drives, with a backup of a portion of the RAID array's total storage to each drive. Your Mac sees the RAID array as a single volume, allowing any backup programme, such Intego Personal Backup, to copy data from those drives to other drives.

oliverb

  • Calf
  • *
  • Posts: 14
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #11 on: September 04, 2022, 01:09:58 PM »
Another really important point about backups is to make sure the right things are being backed up, and that you are able to recover successfully. One nice thing about mirroring to another NAS is you can actually "see" the backup copy as a windows drive. The downside is it is hard to get "depth", a mirror backup only holds the latest version so it doesn't protect against files being changed or deleted.

Data recovery "fire drills" should be considered.

Also think carefully before encrypting your backup copies. There's a trade off between the risk of the backup falling into the wrong hands vs. the possibility of you being denied access to your files at a critical moment. Also I don't see why the encryption on budget backup tools is going to be any better than that on ZIP files, which turned out to be extremely vulnerable to plaintext attack.

My own experience of backup failure was that a crucial folder had been excluded from the backup because it happened to have the same name as another folder on a different computer. This was when running a "peer to peer" windows network, pre-NAS.

In another story on Reddit someone had a Windows system divided into a system volume and a data volume, then made immaculate backups of the system volume not the data.

Then there was a situation with a brand of backup software where I seem to recall there was a patent dispute regarding compression algorithms and newer versions of the software couldn't uncompress backups made with older versions. Not sure about the outcome as we never actually had to use the backup.

Also there was a tape drive that worked quite well initially when connected to a FDD port, but then we bought an "accelerator" card and after that it seemed to snap tapes regularly.

Elsewhere stories abound of organizations that have a fireproof safe full of blank tapes.

« Last Edit: September 06, 2022, 01:04:29 PM by oliverb »

DianeMoore

  • Calf
  • *
  • Posts: 2
Re: Data Reliability, RAID, and why you really do need to have a backup
« Reply #12 on: April 18, 2023, 12:32:50 PM »
helpful information