The long-awaited AXF open format for long-term preservation and storage is designed to support interoperability among systems and ensure future access to valuable, file-based assets regardless of type or how they are stored.
Thanks to technology, we've now got more ways than ever to communicate with each other through audio and video. However, the same proliferation of technology that creates so much opportunity also has resulted in a multitude of formats and systems for storing digital media. However, those formats and systems often are not compatible with one another. Here we are not talking about interoperability of the media files themselves (as has long been the dream of MXF), but rather the actual operating system, file system, storage technology and devices used to capture, store and protect these valuable media assets now and in the future. This diversity and potential long-term incompatibility makes reliable and guaranteed access to these assets complicated, expensive and sometimes downright impossible. Solving the problem means establishing a common format for digital media storage that works not only with any existing system, but also systems that have yet to evolve — an open standard for the long-term storage and preservation of media assets.
Although this may seem unnecessary on the surface, there are many documented cases today where important files stored on dated technology using non-standardized methods have become inaccessible and are therefore lost forever. We will likely be able to recreate an MPEG-2 software decoder on whatever platforms exist 100 years from now, but are we certain we'll be able to find a system compatible with FAT32 to be able to recover the MPEG-2 content itself?
The answer to this daunting problem lies in the new Archive eXchange Format (AXF), an open format that supports interoperability among disparate content storage systems and ensures the content's long-term availability no matter how storage or file system technology evolves. AXF inherently supports interoperability among existing, discrete storage systems irrespective of the operating and file systems used and also future-proofs digital storage by abstracting the underlying technology so that content remains available no matter how these technologies evolve.
What is AXF?
At its most basic level, AXF is an IT-centric file container that can encapsulate any number of files, of any type, in a fully self-contained and self-describing package. The encapsulated package actually contains its own file system, which abstracts the underlying operating system, storage technology and original file system from the AXF object and its valuable payload. It's like a file system within a file that can store any type of data on any type of storage media.
The embedded file system
This innovative embedded file system approach is AXF's defining attribute. It allows AXF to be both content- and storage-agnostic. In other words, because the AXF object itself contains the file system, it can exist on any generation of data tape, spinning disk, flash, optical media or other storage technology that exists today or might exist tomorrow.
Because of this neutrality, AXF certainly supports the modern generation of data tape technologies (LTO5, TS1140 and T10000C, for example) and because there is no dependency on the features of the storage technology itself, it supports all legacy storage formats as well.
What makes AXF better?
AXF offers many significant advantages over other formats and approaches such as Tape ARchive (TAR) and Linear Tape File System (LTFS) for long-term storage, protection and preservation.
AXF can scale without limit, which distinguishes it sharply from legacy container formats like TAR. Like AXF, TAR uses a file container approach that works on any file type of any individual or total file size with support for multiple operating systems. However, TAR's age and tape-based roots yield inevitable limitations. For example, it incorporates neither descriptive metadata support nor a central index for file payload information, which makes random access to files challenging and slow. In large TAR archives, the performance penalty is significant, effectively making the format unsuitable for any situation where random access to individual files is required, let alone random access to portions of the contained files as required by operations such as timecode-based, partial restore.
Certainly, TAR has evolved over the decades, but it has done so typically in divergent paths that lead away from its open-source origins. As a result, it is difficult, or impossible, to recover some TAR packages today, rendering them lost forever.
Also in contrast to TAR, AXF incorporates resiliency features that make it possible to recover object contents, descriptive metadata and media catalogs in a multitude of failure and corruption situations. AXF also incorporates comprehensive fixity and error-checking capabilities in the form of multiple per-file and per-structure checksums. TAR lacks these features that should be considered mandatory for modern systems.
The embedded file system enables AXF to translate between any generic set of files and logical block positions on any storage medium, whether the medium has its own file system or not. This essentially abstracts the underlying file system and storage technology and allows systems that comprehend AXF to ignore any of their complexities and limitations.
While AXF can work in harmony with LTFS, it also has advantages over it. LTFS relies on storage technology elements — such as partitioning and file marks on data tape — that hinder both its storage capabilities and its performance. Likewise, a format such as LTFS is ineffective for complex file collections with tens of thousands or even millions of related elements as it lacks any form of encapsulation and instead relies on simplistic file and path arrangements.
AXF can support any number and type of files in a single, encapsulated package, which inherently means these AXF objects can grow exponentially in size. With its inherent support for spanning objects across media (such as over multiple data tapes), AXF offers significant advantages over LTFS, which offers no spanning support — rendering it ineffective in large-scale archives typical in media operations.
For the preservationist community, AXF offers support for the core OAIS (Open Archival Information System) reference model with built-in features such as fixity (per-file checksums and per-structure checksums), provenance, context, reference, open metadata encapsulation and access control. This adherence to established industry practices is another significant benefit of AXF over LTFS.
Once content is stored in the system, the media itself can be transported directly to any other system that also comprehends AXF offering the same “transport” capabilities of LTFS with the additional features highlighted above.
Front Porch Digital is currently working with SMPTE to standardize AXF and promote it as an industry-wide method for storage and long-term preservation of media assets. Further, the committee hopes its work will extend far outside of the media and entertainment space and into the broader IT community because of its wide-reaching applicability and unparalleled features.
These factors are key to AXF's ability to support large-scale archive and preservation systems as well as simple, standalone applications.
How does AXF work?
AXF is designed so that each AXF Object (or package) is comprised of three main components regardless of what technology is used to store them (spinning disk, flash media, data tape without a file system, data tape with a file system, etc.). (See Figure 1.)
The first part is that each AXF Object originates with an Object Header — a structure containing descriptive XML metadata such as the AXF Object's unique identifier (UUID and UMID), creation date, object provenance and file-tree information, including file permissions, paths, etc. Following the AXF Object Header is any number of optional AXF Generic Metadata packages. The AXF Generic Metadata Packages are self-contained, open metadata containers in which applications can include AXF Object-specific metadata. This metadata can be structured or unstructured, open or vendor-specific, binary or XML.
The second component of the Object construct is the File Payload — the actual byte data of the files encapsulated in the object. The payload consists of any number of triplets — File Data + File Padding + File Footer. File padding, which ensures alignment of all AXF Object elements on storage medium block boundaries, is key to the AXF specification. The File Footer structure contains the exact size of the preceding file, along with an optional file-level checksum designed to be processed on-the-fly by the application during restore operations with little or no overhead.
The final portion of an AXF Object is the Object Footer, which repeats the information contained in the Object Header and adds information captured during the Object's creation, including per-file checksums and precise file and structure block positions. The Object Footer is important to the resiliency of the AXF specification because it allows efficient re-indexing by foreign systems when the media content is not previously known, offering media transport between systems that follow AXF specification.
Because of this standardized approach to the Object construct, which abstracts the underlying complexities of the storage media itself, simple access to the content is ensured regardless of the evolution of technology now and into the future.
Special structures for use with linear data tape
When used with linear data tape typical in large-scale archives today, an AXF implementation includes three additional structures to incorporate key, self-describing characteristics on the medium itself, ensuring full recoverability and transportability:
ISO/ANSI standard VOL1 volume label: The first structure, which appears on the medium, is an ISO/ANSI standard VOL1 volume label. This label indentifies a tape volume and its owner. This is included for compatibility purposes with legacy applications to ensure they do not erroneously handle AXF formatted media and to signal applications that do understand AXF to proceed with accessing the objects contained on the medium.
Medium Identifier: The second structure is the Medium Identifier, which contains the AXF volume signature and other information about the storage medium itself. The implementation of the Medium Identifier differs slightly depending on whether the storage medium is linear or nonlinear, and whether it includes a file system or not, but the overall structures are fully compatible.
AXF Object Index: The third structure is the AXF Object Index, which is an optional structure that assists in the recoverability of AXF-formatted media. Information contained in this structure is sufficient to recover and rapidly reconstruct the entire catalog of AXF Objects on the storage medium. In a case where the application has not maintained the optional AXF Object Index structures, the contents of each AXF Object can still be reconstructed by simply processing each AXF Object Footer structure.
Who is the ideal AXF user?
AXF was developed to meet a broad spectrum of user needs — from those accessing petabytes of data in a high-performance environment to those looking to simply encapsulate a few files and send them to a friend via email. AFX is completely scalable to accommodate an operation of any size or complexity. In all cases, AXF offers an abstraction layer that hides the complexities of the storage technology from the higher-level applications, while it also offers fundamental encapsulation, provenance, fixity, portability and preservation characteristics. In addition, the same self-describing AXF format can be used interchangeably on all current storage technologies, such as spinning disk, flash media and data tape from any manufacturer now and into the future.
The bottom line
AXF has the ability to support interoperability among systems, help ensure long-term accessibility to valued assets and keep up with evolving storage technologies. It offers profound present and future benefits for any enterprise that uses media — from heritage institutions, to schools, to broadcasters, to simple IT-based operations — and is well on its way to becoming the long-awaited, worldwide, open standard for file-based archiving, preservation and exchange.
Brian Campanotti is CTO at Front Porch Digital.