Shortly after that magical, wondrous moment when the first two computers were connected together and a message was transmitted or a file was exchanged, the network went down.
At that moment, network and system management was born.
Prior to that climactic moment, most computer-engineering efforts were geared toward getting individual computers to work. Debugging a single computer itself is no small feat, involving significant effort just to determine the area causing the problem. If it's hardware, which component? Memory, disk, peripherals, CPU or some connection between them could be at fault. If software, which piece? The operating system, the application program, the memory manager or a connection between them could be the culprit. Anyone who has spent time on the phone with tech-support sorting out a problem with a single personal computer knows that many factors can be involved.
As soon as you network computers, the number of possible problems multiplies. In most networks, more than one vendor and many technologies are involved. To the possible hardware problems associated with one computer, add problems with cabling, routers, hubs and network cards. To the usual list of software suspects associated with one computer, add network operating systems, network card drivers and router software. The scene can become quite confusing, flush with vendor finger-pointing.
In the late 1980s, as networks and network management were developing into a major problem, several standards initiatives emerged to help vendors and users tackle their mounting problems. Some of the terminology involved can be confusing, but to be successful, it is important for IT managers to be familiar with some of the basic concepts and terms often mentioned in connection with network and system management.
Network vs. System Management
Obviously, a network consists of both the communication channels and the devices using those channels. The communication channels consist of the wires, protocols, software and devices providing the medium through which machines talk to one another. The devices using the network are everything else that lives on the network -- workstations, printers and scanners.
The handling of the network, including things like routers and hubs, is called network management. Management of the devices using the network, such as workstations, is called systems management. In recent years, however, the distinction between the two has become blurry as tools have emerged that give a unified view of both network and systems management.
SNMP, CMIP and MIBs
Two common protocols to manage Internet devices emerged at approximately the same time: the Simple Network Management Protocol (SNMP) and Common Management Information Protocol (CMIP). Originally, SNMP was intended to fill the short-term need for network management, to be eventually replaced with CMIP for long-term needs. As its name implies, SNMP is a simple protocol -- its original description was only 32 pages -- that was easy to implement. It provides a simple query-type protocol that a network management program uses to ask devices living on the network how they are doing, how busy they are, what errors they have encountered, etc.
When SNMP was published, so was a list of information it could request from various kinds of devices. This list is called a Management Information Base (MIB). The original MIB contains a hierarchical list of objects, starting from the largest and descending to the smallest. The Internet itself is one of those objects -- at some level underneath it are individual devices such as routers, hubs, switches, etc. The MIB lists all the essential variables a particular type of object should make available to a network-management program. For example, the MIB contains a list of the variables routers should maintain and make available. The MIB describes what is available, SNMP tells how to get it.
Because network-management programs know the MIB, they know which variables they can get from which kinds of devices -- to the extent the MIB is supported by the device vendors. SNMP MIB support is nearly universal. The CMIP also uses a MIB to know what it can and cannot request from a managed device. However, it is a more complex protocol. The original requirement that SNMP and CMIP would share the same MIB was subsequently dropped and the protocols, while retaining similarities, grew apart.
SNMP is widely supported and is commonly used for basic monitoring of network devices. A second version of SNMP (SNMPv2) has been in discussion for many years. The intention of SNMPv2 was to expand the protocol and resolve some of the shortcomings of the original. Unfortunately, because of the additions and extensions to the original SNMP, it probably should no longer be called "simple."
DMI and MIF
While MIBs describe network type of devices and the information they will make available to management protocols such as SNMP and CMIP, the Desktop Management Interface (DMI) helps solve the problem of how to manage individual desktop computers.
DMI is an agent, i.e., "a program that performs some information gathering or processing task in the background. Typically, an agent is a given a very small and well-defined task" . It runs on a PC and accesses a locally kept Management Interface File (MIF) containing descriptions of all the devices in the computer that can be managed. Hardware vendors, such as disk or network-card makers, provide information about their products that gets stored in the MIF. Management programs then make a request to DMI to retrieve the information on individual components. Because DMI provides a standard interface, management programs do not have to keep track of how to get information on a particular component -- they just need to "speak DMI." DMI gets the requested information out of the MIF and returns it to the requester. This data can be used for such things as taking inventory of workstation hardware.
DMI has also been extended to manage MIF information provided by software vendors, thereby making software part of the overall management picture.
Frameworks and Agents
System and network management are complex and ever-changing. Many tools are on the market for tackling them. Tools such as Visio Enterprise extend an existing core graphical technology to help administrators build physical or logical maps of their networks. These kinds of programs can be very helpful in tracking what exists, and where and how things are connected. In the case of Visio Enterprise, the network mapping functionality is part of a suite of related tools that help IT administrators graphically represent and manage the enterprise.
However, the larger trend in network and system management is toward the development of management frameworks. The two best known are HP's OpenView and IBM/Tivoli's Management Framework. Both include the concept of a management console or workstation used by an administrator to view and manage the network and the objects living on it -- the managed objects.
Both supply a framework in which management tools can operate. The framework provides basic services, such as a unified look and feel, a central database for storing information about managed objects and the underlying mechanism for communicating to the managed objects or to other management consoles. A vendor may develop a program to manage a new kind of high-speed communication device. Instead of having to develop a whole new standalone product, the vendor can write the management tool so it will "snap in" to the existing management framework. As a snap-in, the program can take advantage of the framework's basic services. The vendor is saved the time and expense of building an infrastructure for the tool and the end user is saved the headache of having to learn a new interface and communication scheme.
Both OpenView and TME also provide a growing set of tools that live inside the framework. The tools can do such things as generate a map of network devices, monitor SNMP variables, report when a variable reaches or exceeds a threshold set by the administrator and help isolate network and performance bottlenecks.
Both platforms also provide agents that can be installed on workstations or network servers. These agents independently monitor the devices on which they are installed, only reporting items of particular note to the management console. This approach helps alleviate unnecessary network traffic that could occur if thousands of workstations were sending routine information to a central management console. In many cases, the agents can be configured to repair common problems.
Deciding whether the HP or IBM framework is best depends on a close examination of the existing network and the exact management requirements. Neither solution is inexpensive -- both in initial cost and in the expertise it takes to learn and use the system effectively. However, both are far cheaper than the astronomical costs associated with an unmanaged, unmonitored large network.
The good news about that first network that went down is that the engineers located the problem pretty quickly -- after all, there were only two computers involved -- and the network came back up in short order. Not only did it come back up, but it came back up more solidly than before and stayed up longer.
In what seems like a couple of days later, we had the Internet and large agency intranets. Today, network equipment is far more stable and reliable than it used to be, but because networks are so much larger, it sometimes is harder to isolate the problem.
Fortunately, as networks have grown, so too have the options for managing those networks. For those buried under large, unmanaged networks, the appearance of such tools has come none too soon.
David Aden is a senior consultant for webworld studios, an application-development consultancy in Northern Virginia. Email