Adding QoS to optical networks
Network complexity and customer demands require a new generation of service-level-agreement reporting systems.
BY KRIS IYER, Clear
Service providers are constantly looking for new strategies to help them meet the pressures of today's rapidly changing and increasingly competitive telecommunications market. New technology, new services, increasing bandwidth, and increasing network complexity are imposing new challenges. In particular, the growth of the Internet, new multimedia applications, and the soaring increase in "mission-critical" electronic commerce are driving continually increasing customer demands for high-quality service from their telecommunications service providers.
At the same time, the critical issue facing most telecom service providers is: "How can we differentiate ourselves from the competition?" Some service providers position themselves against their competitors strictly on price. Although this strategy can offer some temporary success, competitors are likely to quickly match any discounts. And if everyone competes strictly on price, profit margins are bound to eventually disappear. For long-term sustainable growth, a provider needs to truly differentiate itself.
In some businesses, differentiation can be based on either product capabilities or service quality. But in telecommunications, product capabilities generally come from network technology, which is readily available to all competitors. That ensures any lead in product will be short-lived, at best. As in many other highly competitive businesses, service quality is the best way for a telecom service provider to achieve a sustainable competitive advantage.
Customers, particularly business customers, are looking for both low price and high quality from their service provider. They also want one provider to act as "general contractor" for their end-to-end network, taking responsibility for overall service quality, even if segments of the network are obtained from other, subcontracting carriers.
Today's networks, based on new optical-networking technologies, are in fact much more reliable than were previous generations. Furthermore, newer intelligent network elements provide a wealth of performance data that can, in theory, be used to monitor service quality, detect trouble early, and improve quality. Performance data can also be used to objectively measure and prove superior reliability to demanding customers.
But most service providers are failing to meet customers' rising expectations for quality of service (QoS). And even service providers who offer superior reliability and quality cannot easily prove that they are any better than their competitors.
The inability to manage and measure service quality inhibits the service provider from differentiating its product offerings. It is difficult, if not impossible, to develop and offer premium-grade or managed services to high-value customers at higher prices. The provider is doomed to be just another commodity supplier, competing only on price.
Today, more and more customers are seeking-and getting-service-level agreements (SLAs) from their telecom service providers. These agreements specify the levels of availability, performance, and responsiveness, which will be delivered to the customer and establish penalties for failure to comply.
At the same time, aggressive service providers have developed a successful strategy for competitive differentiation based on superior service quality. The SLA defines, communicates, and clarifies the provider's quality objectives and commitments. At the same time, the agreement establishes an objective means for provider and customer to measure, evaluate, and improve performance against those objectives. The SLA is therefore a critical component of the service provider's competitive market strategy. The result: improved competitive position as well as lower expenses and increased revenue.
The first step is to implement a proactive service-level management program aimed at improving quality by monitoring customer service against SLA objectives. SLA monitoring allows early identification of trouble spots and degrading performance trends. Problems often can be repaired before customers experience a disruption in service. Next, leverage and exploit the program by offering upgraded premium and managed services to major customers, based on the ability to proactively monitor their circuits. Guarantee quality with SLAs and perhaps command higher prices for upgraded levels of service quality. Finally, implement automated SLA reporting to customers. Automated SLA reporting provides customers with objective proof of the provider's superior reliability and responsiveness.
Proactive SLA management and automated SLA reporting to customers have never been easy, and today's optical networks impose some additional requirements of their own. To be effective, the provider's SLA management systems must overcome several challenges.
Processing large amounts of data. Today's optical networks generate vast amounts of fault and performance data. Additionally, customer service is increasingly reliant on the performance of higher-level shared transport facilities. When these shared facilities get in trouble, hundreds-even thousands-of messages can be generated. SLA management requires that all of this performance data be collected and processed.
Making sense manually out of all this data is virtually impossible and has long stymied efforts at automated SLA reporting, which has led to investments in first-generation network-management systems to collect and process the data. But these systems are severely challenged by today's rapidly growing advanced networks. All of the required data must be collected and processed quickly enough to provide useful reports. Otherwise, valuable time can be lost before maintenance personnel are notified of troubled circuits. Plus, customers have to wait too long for the reports.
The result is that service providers cannot implement full-scale automated SLA reporting with these first-generation management systems. It would require too much manual effort to prepare reports, and extra staff is not available. In contrast, a next-generation SLA reporting system collects, filters, and analyzes all the performance data from all network monitoring points and other data sources in real time.
Correlating network data to customer services. SLA reporting systems must generate meaningful customer summary reports based on network data. Fault and performance data from intelligent network elements is associated with individual monitoring points for relatively short intervals of time. To make sense of this data, it must first be correlated to the individual customer circuits served by those monitoring points. Then, statistics and indices for the customer service can be calculated and displayed for broader time periods.
Many first-generation management systems cannot automatically correlate network-element data to the underlying customer circuits. In contrast, next-generation SLA reporting systems automatically download circuit assignments, layout, and configuration details from inventory systems and automatically update this information as changes occur. The SLA reporting system then uses this self-generated model of the service provider's network to automatically analyze the customer service impact of the fault and performance data provided by intelligent network elements (see Figure 1).
Tracking dynamic network reconfigurations. To analyze the customer-service impact of network troubles, the SLA reporting system must understand the physical topology and configuration of the network and the customer circuits the network carries. In the past, network topologies were relatively simple and static. But today's optical networks feature multiple SONET/SDH rings and DWDM/optical-crossconnect mesh topologies. Now, each mesh enables 1:N route diversity, with automatic reconfiguration by each optical crossconnect to restore service. Furthermore, these dynamic reconfigurations of the network, and the customer paths through the network, are unknown to the provider's inventory system.
The complexity of optical networks and their ability to dynamically reconfigure themselves present an imposing challenge to service impact analysis. Manual correlation is virtually impossible, and even first-generation automated systems are unable to track automatic changes in the network and apply those changes to ongoing fault and performance data (see Figure 2).
In contrast, a next-generation SLA reporting system maintains a dynamic model of the network by discovering the initial configuration from the provider's inventory systems, then continually collecting and processing updates whenever intelligent network elements reconfigure customer service onto alternative paths. The next-generation system then uses its dynamic model of the network to dynamically correlate alarm and performance data to the affected customer circuits and services. The result is a history of performance against SLA objectives for each customer circuit.
Creating partitioned SLA reports for each customer subnetwork. SLA reports for the entire network as a whole are useful for measuring and tracking overall quality. But to objectively measure and prove the quality of service that a particular customer is receiving, the SLA reporting system must create partitioned reports. These reports separately calculate and display the performance of only that part of the network that particular customer is using. Details, statistics, and trends must be aggregated and presented for that customer's circuits alone.
First-generation management systems cannot easily and automatically create partitioned SLA reports. The result is that scarce manual staff effort must be used to partition or (more likely) partitioned reports simply are not available. With next-generation SLA reporting systems that dynamically analyze service impact, subnetwork partitions for individual customers can be easily created and maintained (see Figure 3).
Reports also can be partitioned by vendor for supplying network subcontractors, allowing the provider to compare quality across suppliers, discover problem suppliers, and reward the best suppliers. Also, partitioning by geographic region or type of circuit can help internal quality-improvement efforts by separately tracking the efforts of each work group and identifying problem areas for corrective action.
Partitioned SLA reports ensure that each customer, subcontractor, or work group sees summaries, indices, and details specific only to its part of the overall network.
Flexible SLA reporting. First-generation management systems often contain a few "hard-wired" reports, but these reports cannot be easily modified to suit the evolving needs of internal users, managers, and customers.
Report requirements can be expected to constantly change as new technology, services, and organizational strategies develop. When report formats are "hard-wired," new types of reports must either be manually compiled or programming changes must be made. Manual report generation requires additional clerical effort. Programming changes to existing management systems can be expensive and take a long time to implement.
In contrast, next-generation SLA reporting systems should provide a wide variety of flexible reporting options out of the box. Examples of such reports include detailed historical reports for analyzing individual circuits over time, statistical reports on groups of circuits, reports for comparing individual circuits within a group, and reports for comparing groups to each other. Additionally, next-generation systems should include the capability for providers to customize their own variations and enhancements, without the need for conventional programming.
Distributing SLA reports over the Web. Even automated reporting systems can require large amounts of expensive manual effort to schedule and supervise the large-scale distribution of SLA reports. Preparing and distributing reports for internal users, or even for a few very special customers, require significant effort. Distributing reports on a regular basis to hundreds or thousands of customers would create a major staffing problem.
To improve the speed, efficiency, and quality of customer interactions, many service providers offer their wholesale and retail customers access to management functions and reports over the Internet. Web gateways also provide fast, cost-efficient access to network data for the service provider's own field personnel and managers.
Web-based SLA reporting requires that each customer only see data about its own circuits and services and its own shared portion of the service provider's network. A secure, partitioned view of each customer's virtual network is required. Once again, that requires a next-generation SLA reporting system that dynamically correlates network data to its impact on individual customer services.
The soaring increase in "mission-critical" electronic commerce is driving customers to demand continually increasing levels of high-quality service from their telecommunications service providers. Many of these service providers are beginning to offer SLAs that guarantee minimum levels of various performance metrics. These agreements allow the provider to compete and differentiate based on service quality as well as cost.
To proactively manage SLAs in today's next-generation optical networks, providers will require next-generation SLA reporting systems that can:
- Handle large amounts of data in real time.
- Correlate network fault and performance data to its impact on customer service.
- Track dynamic network reconfigurations over diverse routing.
- Create SLA reports that are partitioned and aggregated by customer.
- Enable providers to easily modify and customize SLA reports.
- Enable providers to distribute SLA reports over the Web.
The most critical challenge is that effective SLA management requires the automatic correlation of network data to customer circuits. That is a particularly difficult task in today's optical networks where intelligent optical crossconnects can automatically reconfigure the customer service over multiple alternative paths. Fortunately, a new generation of customer-centric network-management systems can provide this dynamic correlation capability.
Kris Iyer is vice president of business development at Clear (Lincolnshire, IL).