Monday, August 17, 2009

RAID5/6 performance and reliability impact

I know you don't want to hear it, but a fact is a fact: the number one reason for loss of electronic data is user error. Inadvertent deletion, accidental formatting, hot coffee spilled on your laptop, the dog ate the homework, etc. Spilled coffee and accidental formatting happen way more often than hard disk crashes, at least where I hang out. (Why is there a "dental" in "accidental formatting"? It can't be a coincidence with the two of them causing most of the world's pain.)

There is really no foolproof 100% protection from user errors. Users are just too imaginative and sneaky. No matter how well their computers try to protect themselves against their owners, the owners find these inconceivably spectacular ways to lose their baby photos and tax records. Yes, tax records fall victim to more hard disk crashes than there are hard disks in existence, and the IRS is investigating this matter. The dogs are planning a class action suit, too.

There actually is a way to protect against user error: frequent backups. Do it yourself with utter frequency and consistency. Also, backup to the cloud.

The number two reason: hardware error, such as hard disk failure, or some sort of a crash that mangles or destroys your data. Here, there are two ways to protect yourself: automatic data redundancy and backups. The first option will not save you from user error; only from hardware errors and crashes - and even then - not always. The latter option is the only relatively foolproof to protect your data. Backup your data and backup often.

Now that I am done with the Computer Consultant's Number One Mantra, I will concentrate on data redundancy in the form of RAID or Redundant Array of Independent Disks.

This is going to be boring.

The two most common desktop RAID levels are 0 and 1 where The Zero isn't redundant at all and I have no clue why it is still called "redundant" but it's a long running tradition and I like to follow traditions with the exception of popcorn at the movies. I neither understand nor follow this strange tradition of drowning the 24-speaker surround sound exquisitely crafted by Hollywood, with popcorn crunching right between your ears. You do? Write a comment.

With RAID levels 0 and 1 being the most common, there is a growing trend of RAID levels 5 and 6 taking ground on home computers used for storing photos and videos. The reason is simple: 0 and 1 are not good enough: 0 is not protected, 1 isn't very efficient. 0 simply distributes bytes onto several drives storing only one copy of each byte. This makes the set faster but completely unprotected: one disk fails and your entire data stored on the set is gone. Not only Zero is unprotected, it puts your data at a higher risk than a single drive. The Mantra above is especially important with the Zero, please repeat after me:

Backup your data and backup often.

I warned you it was going to be boring.

RAID level 1 puts each byte onto two drives simultaneously. Each byte is stored twice, so if one drive fails, there is still a copy on the other drive. The data is protected right up until one of the drives fails. Once it fails, and eventually, they all do, the data is not protected until the RAID1 set is rebuilt again, i.e. the failed drive is replaced with a new one, and the set is restored to the "healthy" status. Which means RAID1 not only takes two drives to protect one, it is also not a 100% protection against drive failure. (Repeat after me...) In the end, the cost of protection is 50% of the total capacity: in a RAID Level 1 set of two 1TB disks, the total usable capacity of 1TB is half of the total capacity of 2TB.

With gigabytes and terabytes getting cheaper, lighter and physically smaller, it's not far-fetched to put 4 or 5 of them in a computer and try to protect them against a failure of one. Or two. Enter RAID levels 5 and 6: that's exactly what they do. Level 5 protects against a failure of a single hard disk in any RAID5 set, and Level 6 - against two. The cost of this protection is in a loss of a capacity of one or two disks, respectively. In other words, a RAID Level 5 set of five 1TB disks will have a usable capacity of 4TB, and a RAID Level 6 - 3TB. This is more efficient than RAID Level 1 as the usable capacity is more than 50%. There are also other RAID levels we will not touch in this article, as they are far less common than 0, 1, 5 and 6.

That said, RAID Levels 5 and 6 put a much heavier load on individual hard disks compared to other levels, notably 0 and 1, and heavier duty (enterprise level) hard disks are recommended for these configurations. Individual hard disks in RAID5 and RAID6 arrays have two parts: data and parity. Each individual write operation to an array will consist of two resultant write operations to each hard disk: to its data portion, and to its parity portion. The potential performance penalty is significant and can range from 10-15% to 90%, depending on an application, drive and controller characteristics. While disk and controller caches often reduce the penalty and improve RAID5/6 write performance, the fact that each drive has to do a double duty for each write operation, still remains. It is thus recommended to use heavy duty hard drives with longer MTBF numbers, designed for enterprise applications, in RAID5 and RAID6 arrays, and their derivatives.

Examples of "lighter duty" desktop drives not recommended for RAID5/6 arrays with moderate to heavy performance loads:
Examples of "heavy duty" enterprise hard disks recommended for high performance RAID5/6 applications:
Note: I am not including enterprise class hard drives with 10,000rpm or higher rotational speeds, or SSD models, because they are still quite a bit more expensive than their 7200rpm counterparts, and are usually cost-prohibitive for mainstream video editing applications.

Sources:

DV411 Digital Signage Solutions