An SNMP-managed network can detect failures instantly.
There are many reasons to consider using products that can be monitored and controlled with software suites. Among the most obvious is the level of complexity in modern facilities and the need to add an abstraction layer that makes the complexity less overwhelming for operators. Think for a moment about how many products are in large plants today. For instance, in Turner's Atlanta Network Transmission Center (Turner NetOps) there are more than 4750 modules in 350 rack frames managed by a single network. Imagine for a moment how difficult it would be to chase configuration and fault issues in a system that large without software tools to make it possible to abstract the complexity to a single screen.
Using a software product to manage such an installation is off scale for most installations. But the second reason to consider software suites is the amount of time available to do the job manually. We are often concerned about mean time between failures (MTBF). That is a measure of how often each piece of equipment might fail based on statistical data about component and module reliability. It is an important measure, but even more important is mean time to repair (MTTR). The MTBF might show that a DA is unlikely to fail in your lifetime, but if it does fail, it is much more important to be able to find the fault quickly and cure the problem, perhaps by swapping a spare module into the slot.
The MTTR is affected by many things. Let's say each module has a loud annunciator that will notify you of a failure, assuming the circuit that runs the noise maker does not itself fail at the same time. The MTTR would be determined by how long it announces failure before you finally hear it, the time to go to the rack room and find it, back to the shop to find the spare, and finally back to the rack room to install it in the slot. It's simple, quick and effective. The failure may only last a couple of minutes if you are fleet of foot.
But let's say the failure happens in a large facility like NetOps. With 300 or so racks, you might not hear the failure unless you walk down every row of racks, which might take several extra minutes. In large operations, the rack room might be separated by some distance from the QC position or MCR. Additional precious time is lost before someone figures out the failure was perhaps not in an encoder but in a device that feeds the encoder before a technician is dispatched to roam the rack room.
Contrast this with a facility that has a monitoring and control system. As soon as the fault occurs, both the failed module and the next device in the signal path report a problem (absence of picture, or audio perhaps). Immediately, you know the location of the failed module, and a call to maintenance can dispatch a technician with the right module to the exact location, saving precious minutes. In a large facility, the monitoring system might trigger a switch to a redundant path as soon as the failure is sensed, meaning that there is essentially no impact on-air. Still, the failed module is identified and replaced, restoring the backup in minutes.
This is a rather simplistic view of complex monitoring and control products. Many more features are often built-in.
An important consideration is the ability to use Simple Network Monitoring Protocol (SNMP) to talk to devices from many manufacturers. SNMP uses standardized interfaces, called management information bases (MIBs), which are provided by the device manufacturer.
An SNMP-managed network consists of three key components. First is the network management system (NMS), which is the software running on a network computer. It speaks to network nodes in managed devices through software in the managed device called the agent. Communication can be unidirectional, setting parameters from the management system or getting status on request from managed devices. Or it can be bidirectional, with managed devices reporting parameters on their own to the management system, as in failures. Though developed for computer hardware, SNMP works perfectly well with many pieces of video-specific hardware. A video server system might report the status of the disk array, including failures, and perhaps power supply voltages and fan status, as well as the status of the MPEG I/O ports. A video switcher or router might also show the status of power and cooling, reporting internal temperatures, but also might identify the absence of reference or excessive bit errors on inputs.
All of the data gathered via either SNMP-compliant or proprietary software must be displayed in a way that facilitates efficient and succinct presentation to the operator. Most monitoring systems are therefore highly graphical at the operator level. The majority of facility monitoring and control systems allow mapping devices to a bitmap view of the facility floor plan. When a failure happens, there is no mistaking exactly where in the rack room the technician should go. It is valuable to note that a good monitoring system can extend far beyond a local facility.
For example, Eurovision operates NOCs in many cities around the world where programming can be handed off inbound or outbound to its complex transmission network. All of the interfaces and the virtual circuits across its terrestrial and satellite network can be managed from Geneva at the Eurovision Control Center, as well as from operation centers in the United States and Asia. This allows fault chasing as well as complete setup of the operational status without any staff present at the hardware location, perhaps halfway around the world. The interface is highly graphical to allow operators to see the status of each interface in real time as the network is reconfigured throughout the day. Systems like this are normally configured to send alarms to e-mail accounts or pagers, so failures are not missed for long.
Finally, several video hardware manufacturers have developed systems with tight links to the display of errors. This can allow real-time monitoring to start at the macro level, showing the status of entire nodes in a centralized broadcasting model, but “zoom in” to the micro level when a fault has been detected. One manufacturer calls this lean back, i.e. the macro level in the quiescent state, and lean forward when working on a problem that requires full attention. Using tightly integrated products in a real-time environment like broadcasting is always valuable.
John Luff is a broadcast technology consultant.
Send questions and comments to: firstname.lastname@example.org