Deciding on a file format
Mar 1, 2009 12:00 PM, By Peter Thomas
Embedded metadata
The fact that wrapper formats allow embedding of metadata can be an enabler for certain file-based workflows. However, there are caveats with descriptive metadata. Specifically, there is a lack of recommendations that allow tying down the full metadata semantics. SMPTE's metadata dictionary helps to define the semantics of attributes. However, a fully specified standard reference data model and a specification about how to map that model to the file embedded metadata are required to tie the attributes to specific entities in such a model.
If such a data model was available, an organization could map its own data model to this reference model, thus ensuring that the semantics of the embedded metadata is clearly articulated. If an organization maps its proprietary data model to the embedded metadata, the result is just as proprietary; no other organization has knowledge of the semantics and hence cannot reliably interpret the data when receiving the file. Embedded descriptive metadata only is useful for file exchange between two systems or two organizations if the semantics is unambiguously agreed upon.
Embedded metadata in archives
In general, the usefulness of embedded descriptive metadata in archived files is questionable. At first, there seems to be an advantage in embedded descriptive metadata when archiving because:
- The file can be identified even without a database referencing it;
- In case of a loss of the database, basic information can be restored from the file.
In order to qualify the first point, it's important to understand that, in a digital archive, hundreds of thousands of files reside on IT storage systems, primarily data tape storage vaults. There is no way that a user could find a file via exploration of file level metadata, as this metadata is not searchable. Only when maintaining metadata in a database, or as an index in a search engine, can users search and find content. As the full metadata is available in the MAM database, using the MAM search functions is the only sensible way to search for a file.
For the second point, the IT industry agrees that the right way to protect a database is using standard IT database backup. Restoring a failed database typically takes a matter of hours. In the case of an archive of 100,000 hours of content in DV50 with eight audio tracks, it would take close to 260 days of restore time using a single LTO-4 drive. Even if the process used 10 drives in parallel, it would still take almost four weeks to retrieve and analyze the files. It is more sensible to invest in standard IT protection mechanisms and apply related best practices.
There is also the issue of updating embedded metadata. Unfortunately, metadata changes are quite common. Even basic metadata, such as titles, may change throughout the content lifecycle. Hence, there is a high probability that, if not updated, embedded metadata is outdated rapidly.
However, the primary long-term storage technology used in TV archives today is digital data tape, and it is difficult to apply changes to files hosted on data tape. You have to restore the file from tape to disk, apply the change, write the updated file to tape, mark the former version as invalid, and remove it via defragmentation. Defragmentation means copying all valid files to a new tape and releasing the old tape for reuse. In real-world archives, this is not feasible.
A file may even reside in multiple, and potentially remote, locations. Here, metadata updates would require distributed transactions on all copies. Otherwise the database and the primary archived file go out of sync, and the files in the various locations would have different metadata.
Metadata in file exchange
Being able to embed metadata in files that will be exchanged with external business partners is useful, as it allows you to tightly couple metadata and essence. Within a business, metadata exchange can also be accomplished in alternative ways. Examples are partial database synchronization and exchange of metadata via API or XML files.
Due to the lack of recommendations and standards, partners that want to exchange content with embedded metadata have to agree upon the extent and the semantics of the metadata. The extent may differ depending on the type of transaction and the partner relationship.
Whether the embedded metadata you receive remains in the file after importing it to the MAM database or is deleted is of little importance, even though it is a good idea to retain it. It is important to remember, though, that it will have to be updated when the file is retrieved from the archive and delivered to another partner, as it may have been modified in the database. Or the exchange may require a different set of metadata and semantics.
Peter Thomas is CTO for Blue Order Solutions.
| Want to use this article? Click here for options! |






























