Storage primer

Jan 1, 2011 12:00 PM, By Ciprian Popoviciu and Mohamed Khalia

    

The first question that might come to mind seeing the title of this article is: Why would I care to learn about storage? Isn't it a mundane component of the infrastructure that simply stores data? The reality is that of all components of the infrastructure, storage is truly unique. Not only does it store some of the most valuable assets of the organization, business-critical data, but unlike the other resources (network and compute), storage demand grows continuously along with the data accumulated by the organization. If that was not convincing enough, it is worth mentioning that lack of understanding of the storage options can lead to expensive storage hardware that does not optimally match needs. The world of storage is one of the most dynamic and hot technology areas today. This article provides a high-level overview of storage and the trends relevant to content distribution.

Storage types

The key thing to remember is that there are two dominant storage technologies: storage area networks (SAN) and network attached storage (NAS). (See Figure 1.) SAN is a block level storage technology (fixed sized blocks of data or collections of disk sectors are read or written to storage) wherein the storage devices are made accessible to servers in such a way that the devices appear as if locally attached to the operating system. A SAN typically has its own infrastructure connecting storage devices that are generally not accessible through the local area networks by regular devices. By contrast, NAS is file-level data storage technology (entire files are read or written to storage) connected to a local area network providing data access to heterogeneous clients. This ubiquitous connectivity led to NAS gaining popularity, as a convenient method of sharing files between multiple computers. Potential additional benefits of NAS include faster data access, easier administration and simple configuration. In the end, price and performance are the main differentiators between NAS and SAN. The selection of one technology over the other comes down to deciding how much complexity is acceptable, and what is needed to meet the performance needs of the application and the budget.

Storage system components

The main components of the storage system are the data containers, which typically are hard-disk drives (HDD) or solid-state drives (SSD). The drives distinguish themselves through capacity and read/write speed. Technology enabled disk sizes to grow from megabytes to gigabytes and today to petabytes of data. The access speed is measured in terms of number of I/O operations. The faster the drive spins, the higher the I/O. Solid-state drives provide the highest performance (especially on read but less differentiated in write), and that is naturally reflected in price.

Storage primer

The performance of the overall storage system depends also on the interfaces and protocols used to connect to disks. In fact, most often, disks are named by the name of the connecting interface: FC drives, SATA drives or SAS drives. Fibre Channel (FC) is the cornerstone of SAN. Serial Advanced Technology Attachment (SATA) is a standardized interface replacing ATA and delivering higher speeds than its counterpart Parallel ATA (PATA). Serial Attached SCSI (SAS) is a new serial protocol compatible with SATA; however, it's much faster. From a selection perspective, it is thus important to understand what level of performance your application requires and select a cost-effective technology that supports it. For example, even though more expensive, Fibre Channel at 4Gb/s is going to be marginally faster than a 3.2Gb/s SAS drive.

The disk arrays prevalent in today's enterprises are front ended by purpose-built servers responsible for facilitating and optimizing disk access. These storage appliances distribute data over multiple disks and perform many other operational optimizations and management functions as well.

Storage system operation

When it comes to writing data on the disks, the simple option would be to place the entire data set of files on a single disk (assuming it fits). One challenge is that the I/O is not optimal because the process is serial. The alternative would be to write chunks of the original data set on multiple disks in parallel. The other challenge is that all data would be lost if that particular disk fails; there is no redundancy unless data is duplicated to another disk. Finally, when dealing with such large blocks of data, disk space cannot be used very efficiently. Today, storage appliances are responsible for writing data across multiple disks while embedding various error recovery mechanisms. They are responsible for orchestrating the pool of disks into a redundant array of independent (or inexpensive) disks (RAID). Using this distributed method, cheaper, less reliable disks can be used without the fear of losing data.

Figure 1. Storage topologies.

Figure 1. Storage topologies.
Select figure to enlarge.

RAID's primary goals are to optimize input/output and create reliability. Based on the techniques used in the process, they are identified as RAID 0 through 6, where RAID 0 means data is block striped but without any parity or mirroring. At the other extreme is RAID 6, with block-level striping and double distributed parity. (See Figure 2.) It is important to note that there will always be a trade-off between disk space use and the amount of redundancy built by increasing RAID levels. The important takeaway is that good choices of disk types and RAID levels can significantly reduce costs. Back to the earlier example, instead of expensive FC disks at RAID 5, one can use cheaper SATA drives at higher RAID level.

There are several protocols used to access storage disk arrays over an enterprise network, via the storage appliance (or filer in the case of NAS). The main ones are iSCSI, where the SCSI protocol is transported over IP, FCoE discussed below and NFS, a file level protocol that runs on top of UDP or TCP over IP.

Trends in storage

One of the key recent technology innovations in storage consolidates infrastructure and reduces operational costs. It enables the transport of FC frames over Ethernet (FCoE). This is a block level protocol that delivers the same reliability and security as Fibre Channel while using underlying 10Gb/s Ethernet infrastructure. FCoE enables both local network traffic and storage traffic traverse over the same wire, which results in I/O consolidation.

Figure 2. Comparison of single RAID levels.

Figure 2. Comparison of single RAID levels.
Select figure to enlarge.

An important area of optimization for storage is the elimination of duplicate information. With data continuously growing, the last thing we need is having the same data in multiple locations just under different names. Data deduplication is a specialized data compression technique that eliminates coarse-grained redundant data, typically to improve storage use. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored along with pointers to the unique copy. The size of the savings depends on the workloads of the enterprise and the type of data.

Continue on next page




Want to use this article?
Click here for options!
Get Copyright Clearance

Share this article

blog comments powered by Disqus

 

Current Issue

Online captioning compliance

May 2012

The FCC has issued captioning requirements for all online video. Learn how to meet the requirements of the new rules and how to automate the technical process.

Read More articles...

Related Newsletter

Transition to Digital
A twice per month tutorial on digital technology.

Related Posts


Confused about the terminology in an article? Find definitions of common terms and abbreviations in Broadcast Engineering's Glossary.

 


Video Compression, Editing and Displays

Video Compression, Editing and Displays

Video compression, editing and displays is an in-depth tutorial on MPEG compression technology, editing MPEG content and evaluating color video monitors written by long-time video expert, trainer and writer Steve Mullen, Ph. D.

File Based Technology and Workflow

File Based Technology and Workflow

File-based technologies have replaced video tape methods for a majority of production and broadcast operations. The worlds of AV and IT are coalescing to create new methods and workflows for media

Sound Off Podcasts

 

Broadcast Engineering Digital Reference Guide

Browse Back Issues

Back to Top