RAID Overview
Definition of RAID
RAID, or Redundant Array of Inexpensive Disks (or later also referred to as Redundant Array of Independent Disks) is an acronym first used in a 1988 paper by Berkeley researchers Patterson, Gibson and Katz. RAID is a technology developed to improve data protection and performance while storing large amounts of data, without necessarily requiring improvements in disk drive technology.
RAID Levels
As the definition and awareness of the RAID technology has grown, several RAID configurations for storing data have been devised and standardized upon. These RAID "levels" are now commonly discussed in the industry. The simplest RAID configurations either "stripe" data across two drives to increase data transfer speed, but offer no data protection; or "mirror" redundant data onto a second drive, without increasing performance. More advanced configurations involve three or more drives, and simultaneously provide fault tolerance, increased performance, and the ability to "recreate" information onto a spare drive should a drive failure occur. These more advanced RAID configurations are preferred in server environments where maximum data availability and performance is critical.
The applications, advantages, and disadvantages of the different RAID configurations, or levels, are described below. The numbers assigned to each level of RAID do not indicate superiority or effectiveness; they are only used to differentiate between them.
RAID 0 - Disk Striping
With RAID 0, or a configuration known as "data striping", data is written in sequential sections across more than two drives. RAID 0 is easy to implement, and it can dramatically improve performance. Several drives can be accessed at once, minimizing the overall "seek" time of larger files. This configuration has no data redundancy and therefore no protection against data loss, however, so it should not be used for business-critical applications.
RAID 1 - Mirroring
Also known as "drive mirroring", RAID 1 simultaneously copies data to a second drive. The mirroring method offers data protection and good performance in the case where a mirrored drive fails. RAID 1 is the simplest RAID configuration, requiring only a minimum of two drives with equal capacity, and also that the drives be added in pairs. The main disadvantage of RAID 1 is that it uses 100% drive overhead (the highest of all RAID levels), which can be considered an inefficient use of drive capacity.
RAID 2- Redundancy Using Hamming Code
A RAID 2 array stripes data to a group of drives using a byte stripe. A hamming code Error Checking and Correction (ECC) symbol for each data stripe is stored on a dedicated drive. This code provides detection and correction of data errors, allowing data to be recovered without completely duplicating the data. Since most drives now embed ECC information within each sector as standard, however, RAID 2 doesn't offer any advantages over RAID 3.
RAID 3 - Striping Plus Parity
RAID 3 stripes data across multiple drives, with an additional drive dedicated to parity for error correction/recovery. This configuration offers very high data transfer rates and only requires a small percentage of ECC (parity) to data drives. However, RAID 3 requires a complicated controller design and the configuration may be difficult to rebuild after a drive failure.
RAID 4 - Independent Striping Plus Parity
RAID 4 is identical to RAID 3 except that large strips are used, so that records can be read from any individual drive in the array apart from the parity drive, allowing read operations to be overlapped. Since RAID 4 offers no significant advantages over RAID 5, the RAID 4 configuration is now rarely implemented.
RAID 5 - Independent Striping Plus Distributed Parity
With RAID 5, each block of data is written on a data drive and parity information is then striped across all drives. RAID 5 is the most popular of the RAID levels because it delivers data protection and good performance with a small overhead for parity. RAID 5 offers the most efficient use of drive capacity of all the redundant RAID levels. This configuration requires at least three drives of equal size, which can be added one at a time.
RAID 6 -RAID 5 With Double Parity (or "P+Q Redundancy")
(Not recognized by the RAID Advisory Board (RAB).)
RAID 6 is an extension of RAID 5 that uses a second independent distributed parity scheme. Data is striped on a block level across a set of drives, and then a second set of parity is calculated and written across all of the drives. This configuration provides extremely high fault tolerance and can sustain several simultaneous drive failures, but it requires an "n+2" number of drives and a very complicated controller design.
RAID 10 - Combination of RAID 1 and RAID 0
(Not recognized by original Berkeley papers or by the RAB.)
RAID 10 combines RAID 0 and RAID 1 by striping data across multiple drives without parity, and it mirrors the entire array to a second set of drives. This process delivers fast data access (like RAID 0) and single drive fault tolerance (like RAID 1), but cuts the usable drive space in half. RAID 10 requires a minimum of four equally sized drives, is the most expensive RAID solution and offers limited scalability.
RAID 53 - Combination of RAID 0 and RAID 3
(Not recognized by original Berkeley papers or by the RAB.)
The RAID 53 configuration should really be called "RAID 03". This configuration is a striped array whose segments are essentially RAID 3 arrays. It has the same fault tolerance and high data transfer rates of RAID 3, with the high I/O rates associated with RAID 0 (striping), plus some added performance. This configuration is very expensive and requires at least 5 drives to implement.
RAID Applications, Tradeoffs and Limitations
Typically, RAID is used in systems where data accessibility is critical and fault tolerance is required, such as in large file servers. However, RAID is now also more frequently seen used in desktop systems for CAD, multimedia editing and playback where higher transfer rates are needed.
In general, for a given price point, the performance improvement of a particular type of RAID array "trades off" with the amount of the redundancy and data security of the array. Similarly, capacity of the array "trades off" with the price and fault tolerance. Inexpensive RAID solutions are limited in their ability to protect your data or improve performance, whereas high-end RAID implementations providing very high performance and very high data reliability are quite expensive.
Although RAID can greatly improve the reliability and performance of a storage system, it is dangerous to assume that a RAID system with redundancy provides absolute data protection. Since there are sources of failure that are still applicable to RAID systems, such as viruses, environmental disturbances and/or cases where more than one drive fails at the same time, regular system maintenance and backup remain critical practices.
0 comments:
Post a Comment