Assessing the impact of optical protection with synchronous resilience

May 1, 2000

14 min read

When is optical protection best, when is synchronous best, when must they work together? Serious questions for network managers upgrading their SDH or SONET networks.

Antonio Fontalba
Telefonica

Optical networking is be coming the basis for today's telecommunications networks and will probably constitute an optical layer in the future, not only to provide high-speed transport services for the core networks but also for metropolitan access environments.

Today, Synchronous Digital Hierarchy (SDH) or Synchronous Optical Network (SONET) is the most used technology to build transport networks. Both have resilience mechanisms that provide client layers with the necessary availability for the services they supply.

The deployment of wavelength-division multiplexing (WDM), which leads to the establishment of the lowest-level transport layer, could have a considerable impact in topologies that have been supplied with protection (or restoration) via synchronous methods such as those employed by SDH and SONET.

To address this problem, the optical topology to be considered is the WDM optical ring (the first consistent optical topology), which may carry several or all of the synchronous aggregates. Both situations are studied in two scenarios that illustrate the effect of optical node or link failures and the survivability interworking strategy that would minimize the impact of these failures on the provided services.

Originally, the only topology used to deploy WDM technology was point-to-point, consisting of two optical terminal multiplexers (OTMs), a series of optical line amplifiers (OLAs), and some fixed optical add/drop multiplexers (OADMs). In such a situation, WDM does not constitute a true optical layer but an isolated technology that can enhance the exhausted capacity of a route that lacks sufficient optical fiber.

Some vendors have introduced OTMs capable of implementing optical resilience. Protection mechanisms can be used for all of the optical multiplexation section (OMS) or for some of the optical channels. The former is known as OMS protection (OMSP) or optical section protection (OSP) and the latter as optical channel protection.

However, these protection mechanisms have some drawbacks. Optical channel protection provides only for OTM interfaces not the resilience required against OMS failures (e.g., fiber cuts). On the other hand, OSP can protect the whole OMS in an end-to-end fashion. But when OADMs are inserted along the route, OSP has to be implemented between every pair of optical elements (OTMs and OADMs). The diversity necessary to deploy this mechanism cannot always be found.

Most of the operators who have deployed auto-restorable synchronous rings have not considered optical protection because the synchronous method gives them the right resilience for service provisioning. For them, WDM optical protection, far from helping, would add redundancy (and cost) in protection and probably some undesired effects when both survivability mechanisms try to protect at the same time.

During 1999 and continuing this year, many vendors have introduced new OADMs with some interesting features: a higher degree of insertion/ dropping of optical channels, some kind of flexibility in configuration, optical switching and bridging for protection, etc. With these new OADMs (which are still far from fully crossconnecting the east and west aggregates with the input and output ports in a programmable way), new and quite useful optical-ring topologies can be implemented in both the backbone and access networks.

Optical rings can be self-restorable. Most vendors are considering the protection mechanisms already mentioned (OSP and optical channel protection) for adaptation to the ring topology. Any of these mechanisms works under a failure situation. Optical channel subnetwork-connection protection (OCh-SNCP), as its name implies, protects the optical channel, while OMS shared protection ring (OMS-SPRing) protects all of the OMS. OMS-SPRing is the optical equivalent of the multiplex section shared protection ring (MS-SPRing) standardized by ITU-T in G.841. In SONET, this scheme is frequently referred to as a bidirectional line-switched ring. OCh-SNCP, meanwhile, is the optical counterpart to SDH's subnetwork connection protection, also sometimes called unidirectional line-switched ring.

Since the target protection time is on the order of milliseconds, the amount of time available to perform optical protection is not long enough to ensure compatibility with SDH protection mechanisms. To deal with this conflicting situation and to give a solution to this potential interaction problem, the two following scenarios provide analysis of the impact of multilayer resilience from the point of view of the best strategy to be applied for any case.

The main object in multilayer survivability is to avoid double protection of the same circuit. As this goal is not always possible to achieve, several interworking or escalation strategies are presented to cope with this problem. In general, these strategies comprise a set of rules describing the process to follow to coordinate the multilayer recovery mechanisms:

Selective strategy. The survivability mechanism is applied on a layer at a time, with the mechanism deactivated on the rest of the layers, for instance, choosing between protection on either the optical or synchronous ring.
Sequential strategy. Since it is not always possible to recover from all failures with a single-layer survivability mechanism, sometimes it may be necessary to act on another layer. In the sequential strategy, a layer tries to protect from failures, and the second layer would act only if the previous layer were not able to recover from the fault. In the SDH case, it would be necessary either to adapt the automatic protection switching (APS) ring mechanism or to enable hold-off timers for the MS-SPRing (since the SDH SNCP has hold-off timers in some SDH equipment).
Parallel strategy. With this strategy, every layer tries to do its best to protect from the failure. The main inconvenience is that extra resources are needed in all layers to protect, and double protecting may cause oscillations in the provided service and an unnecessary preemption of the low-priority traffic.
Interlayer coordination strategy. This strategy consists of interchanging alarms and state information between layers in order to know how and where to activate the survivability mechanism. Although it could seem the best strategy, it has a high complexity because it requires multilayer signaling. This strategy is very difficult to implement, since not only do the systems belong to different vendors, but also there can be several vendors on the same layer.

Based on this review of interworking strategies, the following can be concluded: The parallel strategy should be avoided; the interlayer coordination strategy, although conceptually the best, is very difficult to implement in practice; and, thus, only the selective and sequential strategies are of interest for the scenarios under study.

To analyze the impact that optical protection may have on the synchronous survivability mechanism, two scenarios have been considered. Figure 1 shows scenario 1, consisting of an optical ring giving service to some synchronous aggregates of an SDH ring. On the other hand, scenario 2 shows a synchronous ring fully integrated into the optical ring, with all of its aggregates on optical channels.

The optical ring has two fibers with wavelengths traveling in opposite directions: half the wavelengths devoted to working optical channels and the other half to protection channels. To avoid wavelength conversions, the first N/2 wavelengths in the first fiber are working optical channels and the rest are dedicated to protection, while in the other fiber, the assignment is complementary, with the last N/2 wavelengths dedicated to working optical channels and the rest to protection.

To analyze both scenarios, several alternatives will be studied regarding the worst-case node or link failure; disjoint and nondisjoint optical channel routing; and O-SNCP (dedicated) and OMS-SPRing (shared) and their synchronous counterparts, SNCP and MS-SPRing. In general, when the optical ring has implemented an O-SNCP mechanism, it is enough to disable the protection for a particular optical channel to avoid multilayer protection interaction. That's an effective way to use the interworking selective strategy. But this configuration would not handle some failures, as we will see.

For the general study of both scenarios, it is supposed that the optical protection scheme is the OMS-SPRing, which is the case that may provoke the multilayer interaction.

Figure 1. In scenario 1, some optical channels are used for SDH aggregate transport, while in scenario 2, all of the aggregates are on the optical ring. The resulting SDH ring is a 2-fiber ring.

In this scenario (see Figure 2), the synchronous ring has several aggregates on optical channels. The interaction between the optical and SDH resilience schemes would only appear when failures take place on these SDH aggregates.

Since the OMS-SPRing protection cannot be deactivated for some wavelengths and maintained for the rest, it is necessary to use the sequential strategy to allow the optical protection to act before the synchronous one. Nevertheless, this scenario has the disadvantage that if the failure takes place on an SDH aggregate not carried over an optical channel, then the sequential mechanism creates an unnecessary delay in protection that will affect the availability time of the supported synchronous services.

Today, there are some difficulties in applying such a sequential interworking strategy that depends on the kind of synchronous protection used. Only when the SDH SNCP is used at a (virtual container) VC-x level can the hold-off times be programmed. On the contrary, there is no way to use the sequential strategy with the actual MS-SPRing. For this reason, when OMS-SPRing and MS-SPRing are implemented, there is no way to establish either the selective or the sequential escalation strategies; the only option left is to let them freely operate with a parallel strategy. In the middle and long terms, it will be necessary to update the MS-SPRing to implement the sequential strategy either with hold-off times or with a new version of the APS protocol (what seems to be less feasible).

Figure 2. Scenario 1 is presented with two different optical routing configurations. The difference is that in case "b", the optical channels are set up with a disjoint routing with regard to case "a".

With the parallel strategy, when there is no way to do any kind of coordination between the optical and synchronous protections (OMS-SPRing and MS-SPRing), there are some uncertainties about the time taken to perform global multilayer protection:

Time taken to protect from failure. Unfortunately, the parallel strategy does not always provide the quickest protection times. Once a protection mechanism has started, it continues until completion, and depending on the time it begins, the global protection time will be different. The worst case is when the SDH protection is activated before the optical protection finishes. Besides, if automatic protection reversion were programmed, a new period would be added up to the global protection time.
Layer performing protection. With the parallel strategy, it is not known which layer will be the first to protect or if both layers will start their protection mechanisms. Ideally, the transport network would get to a state that should correctly be reflected in the management databases (MIB) so that the network-management functions will not be affected.
Effects on client services. Client service interruption is not the only potential problem; some undesirable oscillations (intermittent interruptions) are probable as well. At worst, three interruptions could take place: the first one, another when the SDH network protects, and a third when automatic reversion protection is on.

To analyze the effect of failures in multilayer survivability, the worst-case optical channel and OADM outages have been considered (see Figure 3), named as case "a" and case "b", respectively.

a) Failures in the OMS. These failures can be handled by optical protection. To avoid protection interaction, the SDH layer should use a selective (or a sequential) strategy. Only with disjoint optical channel provisioning would the situation be recovered by SDH survivability alone.

Figure 3. Here are the worst-case effects of optical channel and node failures in scenario 1, considering disjoint and nondisjoint routing. The resulting SDH ring is shown, with the dotted lines indicating optical-protection recovery.b) Failures in optical nodes (OADMs). This kind of failure (which is far less probable than link failures) cuts off all of the optical channels entering, leaving, or passing through the optical node. WDM protection alone is not sufficient to recover from this situation; SDH resilience is also needed. Again, it can be observed that the situation would be handled by the SDH resilience alone if disjoint optical channels were provisioned.

Figure 4. Scenario 2 is presented with two different optical routing configurations. In case "b", the optical channels are set up with a disjoint routing in regard to case "a".

In scenario 2 (see Figure 4) all of the SDH aggregates are supported by optical channels, which creates an integrated synchronous/optical ring. For this scenario, optical protection is essential for the necessary survivability, but in some cases, it is necessary to have another level of resilience. Figure 5 presents the optical channel and node (OADM) failures on the resulting SDH ring:

a) Failures in the OMS. As shown in scenario 1, failures in the OMS can be handled by optical protection, and the SDH layer should use a selective (or sequential) strategy. Only with disjoint optical channel provisioning would the situation be recovered by the SDH layer alone.

Figure 5. Worst-case effects of optical channel and node failures in scenario 2, considering disjoint and nondisjoint routing. The resulting SDH ring is shown, with the dotted lines indicating optical protection recovery.

b) Failures in optical nodes (OADM). Apart from disconnecting all optical channels entering, leaving, or passing through the optical node, unlike scenario 1, this kind of failure completely cuts off one SDH node (ADM). That means the situation before the fault cannot be recovered. However, the rest of the SDH ring can be reestablished. If the optical channels have a disjoint routing, the SDH resilience mechanism is enough; otherwise, optical protection would be needed as well.

We have reviewed the complexity of multilayer protection with an example of architecture consisting of a synchronous SDH ring on an optical ring. As well, different strategies regarding multilayer interaction have been studied: selective, sequential, parallel, and coordinated. The first two strategies are the most interesting, since they diminish or avoid the interaction that could appear between the different layers' survivability mechanisms and therefore, the impact on supported services. The selective strategy should be used whenever possible and the sequential one can be implemented when there is no other choice.

To study this problem, two different scenarios have been analyzed: one with some SDH aggregates on optical channels (OChs) and the other with all of the optical channels integrated into an optical ring. For OMS failures, it has been shown that disjoint routing is decisive to reduce the number of affected optical channels and allow for a selective escalation strategy (either optical, with O-SNCP, or synchronous, with SNCP), which means that either optical protection or SDH resilience can handle the failure situation. When optical channel routing is not disjoint, SDH protection would not be able to recover from the failure in most cases, and optical protection would be essential.

For node (OADM) failures, SDH resilience is needed. In scenario 1, SDH resilience is enough if disjoint routing is used. In scenario 2, there is always one node disconnected, but SDH resilience is enough to reestablish the rest of the synchronous ring with disjoint optical channel routing. Thus, if it were not necessary to carry all of the aggregates over optical channels, it would be desirable (from the network topology point of view) to connect both ends of the SDH ring segment to two different OADMs.

When OMS-SPRing is implemented on the WDM ring, optical protection cannot be deactivated for some optical channels and kept for the rest. Thus, it is suggested using synchronous restoration (handled from a network-management system) for cases where both optical and synchronous survivability schemes are needed, together with a sequential strategy to let SDH restoration interwork with optical protection.

Finally, it can be concluded that optical channel protection should be used in WDM rings because it can be deactivated if a client network layer (e.g., the SDH layer) requires its own resilience mechanisms. That may be necessary for protection from additional failures different from the ones the optical layer recovers. Additionally, disjoint optical channel routing is essential to minimize the number of affected optical channels so that the client layer alone may efficiently protect its circuits from failures without optical resilience mechanisms.

Antonio Fontalba is a consulting engineer in the technology department of Telefonica (Madrid, Spain). He can be reached at: [email protected].

ITU-T G.805, "Generic Functional Architecture of Transport Networks," 1995.
ITU-T G.841, "Types and Characteristics of SDH Network Protection Architectures," 1995.
ITU-T G.872, "Architecture of Optical Transport Networks," 1999.
Eurescom P.615, "Evolution toward an Optical Network Layer," 1998.
Eurescom P.709, "Planning of Full Optical Networks," 1999.