For a media asset management (MAM) system to be truly valuable, it must offer much more than a repository for content with associated metadata and storage services. In today’s file-based, software-orientated world, a MAM system must offer services for content processing and manipulation. In doing so, it orchestrates people and wider enterprise resources. Media processing is key to putting content to work in the world of post production, broadcast and distribution. Such processing can happen outside a MAM system or inside as an integrated system. Either way, processing high volumes of large media files requires careful thought.
This covers some useful software engineering approaches that can be followed when building scalable MAM systems. The main focus is on the key concepts and components that must come together in an enterprise media processing component. (See Figure 1.)
For a MAM system to be truly useful, every object and component must work within a multi-tenanted environment and, therefore, support access control and ownership. This is a critical component, and it will be shown throughout the article why it is so important.
Jobs and actions
An action is a unit of work or software plug-in that is executed against an asset or group of assets. It is the fundamental building block for media processing. An action has a type that defines the kind of work that it will carry out. It can be a general- purpose file action, such as copy and move, or a media-centric action, such as transcode, QC, package or deliver.
The key to extensibility of this paradigm is that new action types or software interfaces can be created. So, to create new types of transcodes or QCs, for example, one simply needs to implement the related action type interface to change the underlying behavior. For example, for a Deliver Action, one may develop concrete implementations of action adaptors for delivering to Daily Motion and YouTube, or to a broadcast system.
As a unit of work, the action must run within a runtime environment. In the case of media processing, this requires careful consideration as actions are often expected to run for an extended period. With this in mind, any runtime environment must be asynchronous. It must also be transactional to enable rollback from failed media processing actions such as transcodes, file moves and copies.
The fundamental wrapper for the action, in order to bring wider services such as state, transactional integrity, priority and times, is the job. A job incorporates a requirement for access to resources of a given type. In addition, a job points to a type of action and is configured to run at a certain time and with a certain priority. Jobs can be persisted in a database so that the state of the job can be retained in perpetuity for auditing and reporting purposes. Retaining its state also means a failed job can be retried, rescheduled and reprioritized if required.
Given that media processing is resource-intensive and that such jobs can last for an extended period, a job is run in its own execution context by a job scheduler. A scheduler is responsible for preventing jobs from interfering with each other. If jobs are allowed to contend for resources, they will generally decrease the performance of the cluster, delay the execution of these jobs and possibly cause one or more of the jobs to fail. The scheduler is responsible for internally tracking and dedicating requested resources to a job, thus preventing use of these resources by other jobs. When clusters or other high-performance computing (HPC) platforms are created, they are typically created for one or more specific purposes.
The job scheduler polls the jobs residing in the underlying job store (database) and executes them in an execution context. This context is injected into the job, thus allowing a running job to have access to systemwide services such as logging and system state.
In any given scheduling iteration, many activities take place. These are broken into the following categories:
- Update state information. During each iteration, the scheduler contacts the resource manager(s) and requests up-to-date information on compute resources, workload and policy
- Refresh reservations.
- Schedule reserved jobs.
- Schedule priority jobs. In scheduling jobs, multiple steps occur.
- Backfill jobs.
- Update statistics.
- Handle user requests. User requests include any call requesting state information, configuration changes, or job or resource manipulation commands. These requests may come in the form of user client calls, peer daemon calls or process signals.