As broadcast facilities and post-production houses implement new workflow and media management systems, they deal with an array of new IT-based technologies and are inundated with acronyms associated with those technologies, as well as the best practices concerning their use. Adoption of IT-based technologies continues apace with continued convergence across the media industry. Despite this rapid shift of operations into the IT realm, however, the industry as a whole has not been as fast in providing engineering staff with an education on the meaning of terms such as XML, WSDL, SOAP, SOA and REST, the technologies underpinning them, and the role they play in present and future media-focused operations. This article will review the most relevant acronyms, their fundamental principles and their typical applications.
XML and XML schema
Complex control systems depend on a reliable interchange of information in order to make the myriad business decisions that guide material through workflow. This is made simpler if individual systems can swap information in the form of structured data — that is, documents that contain both content and an indication of the content's role in the document. The concept of structured data is not new; humans have been dealing with information formatted this way for centuries. Examples include published (human-readable) books and magazines, as well as web pages displayed by a browser. In both cases, these systems indicate to the consumer the relevance and importance of any particular item on the page through the use of typographical hints — such as underlining text that hyperlinks to other documents. HTML achieves this by including “tags” in the document, and the browser uses those tags to determine how to emphasize the relative value of the associated data.
As a markup language, HTML represents a fairly limited set of standardized tags intended for the presentation of web content, so its utility beyond that scope likewise is limited.
XML, however, offers flexibility that makes it more suitable for commercial use. This is true because, unlike HTML, XML allows users to define their own tags. Rather than defining the tags themselves (or the semantics of the tags), XML provides a facility for defining tags and using them within a document. As with HTML, data is then bracketed within opening and closing tags, which the receiving device can read and act on as appropriate. Figure 1 shows an example of the use of a tag “author” to identify a book's author in a library document.
Because XML tags are user-definable, an XML schema is used to define which tags are valid in a particular document. Such definitions describe, for example, elements that can appear in a document (along with their attributes), whether an element is empty or can include text, and the data types for elements and attributes. Thus, a document can be compared to an XML schema — acting much like a blueprint, style guide or template — to ensure the document is valid and contains all relevant information. Again, this idea is not a new concept solely used by XML. Most databases have some sort of schema. Also, textbooks have used schemas for years, generally referring to them as “style guides.” Figure 2 shows an example of an XML schema for a memo.
SOAP and SOA
Simple Object Access Protocol (SOAP) is a lightweight protocol for the exchange of information. It describes three main elements: an envelope, the encoding rules, and a convention for the representation of remote procedure calls and responses. SOAP is no longer used; however, the technology remains very much in use.
Though it isn't used, the SOAP acronym is sometimes confused with Service-Oriented Architecture (SOA). Though SOAP standards may be part of an SOA application, the acronyms are not related.
SOA is a design philosophy that separates the core functions of a business into independent modules. These modules are referred to as services — essentially software functions — that are called by one or more other services in the overall workflow. (Media transcoding is an example of a service that could be invoked as part of an overall workflow management system.)
Operating on the principle of loosely coupled systems, SOA abstracts the functions of a service from the low-level substructures of that service, leaving the calling service to use much higher-level language to drive the process. In doing so, SOA can make it easier to choreograph multiple business activities and processes among multiple complex software systems, all working under the control of a central process to achieve the required goal.
Consider the operation of a typical media enterprise: Material comes into a facility on tape, via satellite or as a file transfer, and content from each of these sources must be processed before it can be used. Tapes must be ingested, satellite feeds must be fed through some sort of IRD, and files must be received and error-corrected. Some form of transcoding may then be applied in order to provide the material in the house format. A QC stage may then be applied to ensure technical compliance. In the software world, each of these would be software modules in a SOA-based system.
Without SOA, setting up such systems is a time-consuming process that likely will demand customization so that one vendor's automation/asset management system can talk to another vendor's software module or hardware processor. Any operational or technical changes may require replacement of a module or augmentation with another vendor's module.
For example, in a traditional workflow, the automation system must have deep knowledge of the proprietary API calls required to tell a transcoder to transcode a particular piece of material to another format. Every time a new format is added to the transcoder by its manufacturer, that interface must be modified (or in the worst case, completely rewritten). The custom integration software capable of addressing systems from different vendors thus presents a high-potential investment in time and money.
In a more agile SOA-based architecture, the automation system would simply send an XML message that says, “Transcode file X to format Y, and store the result as Z.” This message remains consistent, even if a new transcoder or format is added. Regardless of the vendor, equally equipped systems will perform the same basic transcode when given the same command. Knowing this, engineers can take a more flexible approach to workflow design. This model depends on the fact that each service describes itself and its capabilities to the rest of the system. Web Services Descriptive Language (WSDL) facilitates this exchange of information.
WSDL is an XML-based language used for describing features and capabilities of a particular service. Provided to the central controlling system, this information ensures all other services and applications are aware of a particular service's capabilities. Four critical pieces are supplied:
Interface information describing all publicly available functions;
Data type information for all message requests and message responses;
Binding information about the transport protocol to be used in calling the service; and
Address information for locating the specified service.
When several similar services exist in one environment, information delivered via WSDL allows an application to decide which of those is most appropriate for the current task and to engage that service as required.
Representational State Transfer (REST), an architecture for distributed systems such as the World Wide Web, is targeted at HTML rather than XML. It addresses the scalability of component interaction, generality of interfaces and deployment of components to reduce latency and enforce security of transactions. Unlike SOA, which often is considered peer-to-peer architecture, REST is oriented toward client/server interaction where clients initiate requests to servers, and servers process requests and return appropriate responses.
In a RESTful transaction, requests and responses are built around the transfer of representations of resources, which themselves are specific information sources. A representation of a resource is typically a document that captures a resource's current or intended state. Each resource is referenced with a global identifier. To manipulate these resources, components of the network — typically user agents and origin servers — communicate via a standardized interface like HTTP and exchange representations.
In the case of web page information, the raw data on which the page is built is the resource, but the representation of that data could be different for different users. What is representation of a resource? Consider a human using a browser to view a web page in order to retrieve information, and a computer doing the same as part of some larger activity. Both navigate to the same page, but the computer has no interest in the page's layout and styling; it just wants the data. The human does care about layout and styling (which, after all, exists to make the page easier for humans to digest). The two clients (human and computer) want different representations of the resource, both derived from the same raw data source.
A basic overview of key acronyms can go a long way in helping engineers to understand and implement IT-based technologies and solutions successfully within their own facilities. The topics discussed here are complex and the subject of much discussion and documentation. Engineers can be certain these technologies will become an increased part of the media ecosystem, and they should strive to continue their education in IT-based technologies. A wealth of additional reading material exists, readily available on the Internet.
To paraphrase a leading engineering manager: “Those engineers who adapt to new technology will do well. Those who do not will do less well.”
I hope you do well.
Paul Turner is vice president, broadcast market development, Harmonic.