Active storage

Jul 1, 2007 12:00 PM, BY PAUL TURNER

             

In the evolution from tape- to file-based workflows, asynchronous IP-based storage is increasingly chosen for online and nearline archive storage.

While in many cases, the mainstay of this activity has been RAID-based NAS or SAN solutions, grid storage has made inroads over the last year or so. Offering large storage capacities and simplified system management, grid storage is an alternative approach to the idea of bulk data storage, but it also offers another possibility: active storage. This article will examine the concept of active storage — what it is, how it works and the advantages that it can bring to the entire workflow.

The fundamentals of grid storage

In a nutshell, grid storage is comprised of separate, standalone content servers that are each responsible for storing only part (usually referred to as a slice) of each file loaded onto the system. In this way, the file itself is scattered onto multiple autonomous content servers. Separate metadata servers decide which slice goes to which content server. (See Figure 1.) The metadata servers provide the file system namespace to the various clients in the system.

Figure 1. A typical grid storage system

This arrangement is analogous to the operation of a standard hard drive. The content servers are similar to the sectors of a hard drive, and the metadata servers are like the file allocation table of the drive, where a file name is translated into the addresses of the sectors of the disk where the data can be found. The idea has simply been expanded in the case of grid storage.

This architecture allows clients, whether reading or writing, to first ask the metadata servers for the locations of the slices and then interact directly with each content server to gain access to an individual slice. This is significantly faster than the traditional NAS approach, where all access to storage must pass through the NAS head — an obvious bandwidth bottleneck.

Another unique attribute of grid storage is its ability to provide data protection. Data protection is achieved by making copies of the slices onto other content servers in the grid. At any point in time, there exist at least two copies of all of the slices of each file. The principle is that the failure of any individual content server does not render the data unrecoverable because there's always at least one other copy of each slice available somewhere else on the grid.

The content servers operate autonomously, so re-replication of missing data can happen simultaneously through a number of content servers operating in parallel. An important item to note is that grid storage systems rebuild data, whereas RAID systems rebuild drives. The latter includes rebuilding sectors of the replacement drive that never held valid data in the original, which is clearly an invalid operation. This prolongs the rebuild time and extends the window of vulnerability for another drive failure.

Re-replication of data in a grid storage system happens significantly faster than rebuilding of a hard drive via RAID engine, massively reducing the window of vulnerability. If the replication factor is set to three or higher, the failure of any drive or content server will not leave the system in a vulnerable state because even if one copy of the file is completely lost, the data is safe, as there are at least two other copies of the affected slices somewhere on the grid. This offers even greater user-selectable data resiliency capabilities.

Replication has other advantages too. For example, the average latency encountered by each individual client decreases as the replication factor increases, which is extremely important in today's production environment.

The concept of active storage

Until recently, storage systems have been passive members of the workflow. Once media was stored on them, it remained there until external systems read the data, manipulated it and then put the result back onto the storage. This issue was true when media was stored on tape and has remained true in most cases when using disk-based storage.

Grid storage offers a new opportunity. As previously mentioned, grid storage is made up of separate content servers, each of which has a CPU, RAM and all of the other hardware that make up a modern platform. It is entirely possible for a powerful content server platform to take on additional processing tasks.

For example, each CPU can examine the slices located on its hard drives and perform automatic error checking, calculating a cyclic redundancy check (CRC) from the data. It then compares the CRC to a CRC that was calculated for the slice at the time it was created and was stored along with that data as part of the write process. If the two numbers don't match, the content server can declare its slice to be invalid, and the metadata servers can respond by causing the slice to be re-replicated from a known good copy of the slice to some other storage location within the grid. This effectively makes the system self-healing, with an associated reduction in the need for manual intervention by maintenance staff.

Taking this idea a step further, it is equally possible to use some of the processing power of the content servers to manage and process media. If the storage is aware that the data it is holding are actually media files, it is possible to use some of the CPU power of the individual content servers to perform media-specific processing tasks in addition to the activity of storing and serving up data.

It is, of course, vital that such use does not impinge on the ability of the content servers to provide data services to the various clients connected to the grid, which is its primary purpose. To this end, it is necessary to add a management layer to the system's code to ensure that no content server becomes oversubscribed. The remaining CPU power can be used as raw processing capability, acting on the data stored on the grid, or even being given external data sets, along with instructions on how to manipulate the data by some external application server. Typically, the components of such a configuration include:

  • application controllers, on which the client application GUIs can run, which manage the operation of their individual applications;
  • grid resource management software, which can receive requests for CPU cycles from the application controllers and in response allocate available CPUs to each requestor; and
  • a grid application loader, which runs on each content server to set up the processing environment on that server and physically launch a process.

Suddenly, the system ceases to be a mere storage repository and becomes an active part of the user's workflow. It is easy to see how adding this capability can improve the business of processing material as it passes through the workflow. And such active workflows, by the nature of their parallelism, can operate substantially faster than their passive counterparts. Figure 2 is an example of the processes needed to manage grid storage in this way.

There are several activities that immediately come to mind when considering the possibilities enabled by active storage.




Want to use this article?
Click here for options!
Get Copyright Clearance

Share this article

blog comments powered by Disqus

 


Current Issue

A view from the top

January 2012

Some of broadcast's brightest reveal where the industry is headed.

Read More articles...

Related Newsletter

Transition to Digital
A twice per month tutorial on digital technology.

Related Posts


Confused about the terminology in an article? Find definitions of common terms and abbreviations in Broadcast Engineering's Glossary.

 


Submit your product for our NAB coverage.

Resources

Broadcast Engineering Newsletters Broadcast Engineering Essential Guides Broadcast Engineering White Papers Broadcast Engineering Videos Broadcast Engineering Podcasts Broadcast Engineering Industry Calendar

Industry Calendar

Broadcast Engineering Glossary of Terms

Glossary

Broadcast Engineering RSS feed

RSS

Interactive Media

Broadcast Engineering Webinars Broadcast Engineering Training Broadcast Engineering Blogs Broadcast Engineering Mobile Apps Broadcast Engineering on Facebook

Facebook

Broadcast Engineering JobZone

JobZone

Broadcast Engineering BE Roll

Blog

Featured Products

A Broadcaster's Guide To Camera & Lens Technology

A Broadcaster's Guide To Camera & Lens TechnologyThis eBook provides both new and veteran shooters an in-depth understanding of the technology that lies between the camera lens and the recording medium and how to maximize a camera's performance.

File Based Technology and Workflow

File Based Technology and WorkflowFile-based technologies have replaced video tape methods for a majority of production and broadcast operations. The worlds of AV and IT are coalescing to create new methods and workflows for media

Digital Television Fundamentals

Digital Television FundamentalsThis course, written by broadcast engineer Phil Cianci, provides a basic tutorial platform on the hows and whys of ATSC digital operation.

Video Compression, Editing and Displays

Video Compression, Editing and DisplaysVideo compression, editing and displays is an in-depth tutorial on MPEG compression technology, editing MPEG content and evaluating color video monitors written by long-time video expert, trainer and writer Steve Mullen, Ph. D.

 

 

Sound Off Podcasts

Erik Moreno, co-general manager of the Mobile Content Venture

MCV racks up successes on way to bright mobile DTV future

2012 will be the year of mobile DTV. That’s the view of Erik Moreno, who along with Salil Dalvi, senior VP for Mobile Platform Development at NBC Universal, is co-general manager of the Mobile Content Venture.

Danny Wilson

OTT year in review

Hear snippets of podcast interviews done throughout 2011 with Pat McDonough of The Nielsen Company, Glen Friedman of Ideas & Solutions!, Danny Wilson of Pixelmetrix and Greg Herman of Watch TV. Pictured is Danny Wilson, Pixelmetrix.

 

Broadcast Engineering Digital Reference Guide

Browse Back Issues

Back to Top