Network management moves into the limelight

The importance given to network management can make all the difference in gaining an edge on the competition.

CHRIS FORD, Équipe Communications Corp.

Like death and taxes, it used to be inevitable that network management would show up at the bottom of many service providers' lists of important features for routers, switches, and other platforms, commonly called network elements. For many service providers, network management seemed more like a necessary evil than a strategic partner-an ugly stepchild and pure overhead. But those who recognize the importance of network management are currently enjoying competitive advantages in the marketplace.

Now, due largely to the surging dependence on broadband and optical networks across the world and the lack of skilled professionals to operate these networks, the day of network management may have finally arrived.

Bigger networks can result in bigger, more catastrophic meltdowns. More users mean higher-capacity network elements-and greater administrative complexity. Yet, service providers know they must roll out new services and take on new users to succeed in today's intensely competitive environment. Add these up, and it becomes clear why network management takes on increased importance.

For the large network elements that work at the edge or within the optical-core network (and those that typically populate service providers' regional offices), today's network re quires critical element management functionality across the collection of heterogeneous network elements comprising a carrier's network. This management enables high reliability and service availability. Element management is the lowest layer in the telecommunications management network (TMN) layered model.

In the TMN model, element management provides the foundation for the next layer up-network management-that in turn is the basis for service management and finally business problem management. Element management is vital, because it supplies the fundamental information on element configurations and behaviors. The collection and assimilation of element management data across all network elements is fundamental to operating performance and ensuring overall network availability.

Each element's network-management system (NMS) is primarily responsible for keeping the system running or restoring it after a failure. But because the NMS also supplies information to higher-layer management modules, its ability to monitor the network element and deliver accurate information quickly is equally important. If an element within a network experiences a problem, the entire network may be affected.

To avoid continuous and time-consuming upgrades in the management lifecycle, the network-management software must be able to cope with increased demands as the network grows, while providing complete redundancy that maximizes network availability.

To service providers-and their customers-network management is important for other reasons, too. To upper-level managers, network management can be a competitive differentiator, if it permits the service provider to write more aggressive service-level agreements (SLAs). It helps the provider demonstrate delivery on existing SLAs, as well. Also, effective element network management creates a strong foundation for rolling out new service offerings and provisioning new users quickly. Overall, strong network management lets providers reduce the time spent on problem troubleshooting while increasing their time for pursuing revenue-generating opportunities.

Some customers may not feel the value of network management directly, but they know the difference when their data or voice service degrades. They experience the lack of effective network management when they are impacted by problems that go unnoticed by their service provider. Other customers want to look into their slice of the carrier's network pie and see actual statistics on their network connections to validate for themselves that they are getting the services they paid for.

To do its job, element management should possess these characteristics:

Operational support-system (OSS) integration flexibility. Each vendor's management systems should deliver full coverage of the International Organization for Standardization (ISO) functions: fault, configuration, accounting, performance, and security management. Moreover, solutions should be standards-based so they can be readily integrated into the service provider's end-to-end OSS.
Internal data integrity. Synchronization issues/ problems between the NMS and each network element must be reduced or, ideally, eliminated. With older-generation systems, network-management functions were added onto the existing network-element hardware/software architecture. As a result, the network element had to expend extra effort to synchronize the NMS with information relating to its state (i.e., circuits in use, actual configuration, etc.). These afterthought NMSs increased complexity and reduced software reliability.
Scalability. The ability of a network element to cope smoothly with increased demands (see Figure) is fundamental to creating extensible, highly available carrier-class networks. Networks are growing exponentially with no end in sight, and since the scalability of OSSs is directly related to the scalability of their parts, equipment vendors must provide management solutions that can be expanded as the carrier's network grows. Ideally, they must design scalability into their system solutions from inception. Scalability cannot be retrofitted or added on; it must be an integral part of a vendor's system solution; it must be part of the architecture.
Robust availability features. Strictly speaking, availability should be a focus of each network element's system architecture. Next-generation network elements should feature tight integration between the NMS and embedded-system software and should proactively manage all aspects of the network element to maximize the availability of network services. Redundant components, data replication, standby software, and constant, comprehensive system health monitoring are all examples whereby network management is tightly integrated with embedded software. The resulting design maximizes network-element availability and, hence, network availability.

With network topologies and technologies undergoing massive-and, some would say, unpredictable-changes, the time is right to consider new methodologies and equipment to accommodate next-generation networks. As part of the process, service providers should evaluate the NMS and a new network element as a system, rather than a set of loosely coupled products.

In some cases, the quality of the NMS may prove even more critical in the long run than the speeds and feeds of a given network element. After all, performance and capacity can be upgraded, but a poorly designed NMS stands little chance of improving over time. Network-element vendors, therefore, must ask some important questions.

Does the NMS support real-time, transaction-quality, network-element configuration?

This support gets to the heart of NMS implementation quality and the fundamental question of whether the NMS is designed into, rather than added onto, the network element. Older-generation network elements frequently use simple network-management protocol (SNMP) exclusively to configure new services. Many services require many "SNMP sets" and demanded that all "sets" to be completed or, in the event of failure, that none of the required "sets" be completed. Since SNMP sets may succeed or fail, "dangling" partial configuration records are a real danger.

For example, it may take 10 separate steps to provision a service within a network element. If there's a failure (in the network element or out on the line) during setup, the operator has to roll back each step. In a transaction-free NMS, if one of the unwind steps also fails, it's likely the configuration data will now be completely out of sync or contain dangling invalid configuration records. The operator's only recourse is to go back and re-synchronize the entire network element with a backup, a massive job that threatens to keep the network element offline for an extended time.

Newer-generation NMSs are introducing the notion of transaction-quality operations, where an entire configuration request completes or fails. Transactions ensure configuration data remains intact, greatly facilitating the scalability of carrier-class management systems.

Older-style network elements typically rely on an NMS that is directly connected to the network element. As the number of network elements grew, and as newer users and administrators requested information from the network operations center, the network elements were overwhelmed with the need to support many "like" management requests. That can overload the network element and generate excessive network traffic.

By contrast, newer-generation network elements use multitiered client/server architectures that place one or more NMS servers, and their related information repositories and other equipment, in the center tier. From here, the server can monitor and control numerous network elements, figuratively residing at the architecture's lower-most layer. At the same time, the server can easily communicate upward to multiple NMS clients that run-either via Web browsers or within management applications-at the top-most layer of the architecture.

With a multitiered architecture, the server is capable of handling literally hundreds of network elements and thousands of clients, all without adding to the overhead at the network element. Be wary of any vendor who answers "yes" to the inevitable scalability question but whose NMS doesn't employ multitiered client/server architecture.

Does the NMS push accounting updates and historical statistics updates to upper-management layers?

While the benefits of reduced data synchronization are obvious, there is another part of the NMS process that can do without synchronization: the regular ongoing transfer of accounting or historical data snapshots to the higher-level OSS modules.

In older architectures, the NMS accounting and historical statistics system had to know the network element's state at all times, so it relied on constant updates to stay synchronized with the network element's embedded-system software. When an OSS subsystem requested its own information update (this might happen every minute or two) the management system would pull out a snapshot. Since this process required substantial overhead and was extremely complex, it wasn't good for day-to-day system health, nor did it encourage scalability.

By using push instead of pull technology, newer-generation network elements accomplish the same thing, but in a stateless, asynchronous environment. Because the internal state of a network element is always in synch, the network element simply pushes out its data to the server at regular intervals. The server wakes up at preset intervals and collects the snapshots, which is a simpler process that greatly facilitates scalability.

Is the NMS easy to use and easily integrated into the existing OSS? * Time-to-market and operational cost reductions are key to the success of a carrier. Networks are growing at astronomical rates, so rapid integration and easy-to-use interfaces allow carriers to react quickly and efficiently.

For ease of use, look for a graphical user interface that leverages functions with which network administrators are familiar-an interface that supports tabbed dialogs, tree views, and stoplight color-coding, while offering extended support such as the ability to perform active and passive monitoring. For ease of integration, expect to see support for multiple levels of standard-based integration using such standards as SNMP or common object request broker architecture (CORBA).

Finally, a new-generation NMS should be able to manage itself. Look for comprehensive statistics monitoring tools that measure NMS response times and other parameters involved with day-to-day monitoring and management. These statistics can be used to fine-tune the NMS generally, and they can supply valuable knowledge when the time comes to expand the NMS by adding newer servers or other components.

As for overall network-element evaluation, service providers should examine the entire solution provided by the vendor (i.e., the software as well as the hardware) within which the NMS operates. Everyone expects hardware to be ultra reliable today, and for the most part, it is. But it's still wise to check for system software reliability, through features such as a distributed control architecture, protected-memory operation, and proactive system health monitoring.

Taken together, the NMS and the network element's embedded software should be able to deliver substantial competitive benefits and help service providers win and hold onto new customers, even in the midst of the technical evolutions that we'll face in the days ahead.

Chris Ford is senior product manager at Équipe Communications Corp. (Acton, MA). He can be can reached via the company's Website, www.equipecom.com.