Intelligent electronic-switching fabrics will enable faster, more efficient edge devices.
BILL WEIR, Power X Networks
The amount of dark fiber now in metropolitan and long-haul networks has led some to speculate that the network of the future will be flat and dumb: Just throw a lot of increasingly cheap bandwidth at the problem and get out of the way. After all, bandwidth prices continue to drop as DWDM and even sub-lambda multiplexing get more capacity out of new and existing fiber plants.
However, bandwidth is at an increasingly higher premium inside edge-network switches, which are caught in a squeeze between the core and access networks. With OC-192 (10 Gbits/sec) being deployed in long-haul networks and the "EtherLECs" (local-exchange carriers) getting ready to connect enterprises with 10-Gigabit Ethernet service, today's edge switches are running out of steam. As they aggregate and route multiservice traffic coming in from faster access pipes, they simply can't process and switch it quickly enough to keep the core network full.
And the bandwidth squeeze is just part of the problem. As voice and data converge on packet networks, edge switches must recognize and give appropriate quality-of-service (QoS) levels to voice and other time-sensitive applications. Meanwhile, service providers have to deal with the continuing coexistence of time-division multiplexing (TDM) and packet networks. Maintaining separate TDM and data switches is an expensive proposition that uses up a lot of space and power in crowded collocation facilities and carrier hotels.
This situation calls for a radical new approach to edge-switch architecture. Far from getting dumber, edge switches need to evolve into a much more intelligent life form with a new generation of electronic brain: a protocol-agnostic switch fabric that can provision multiple service levels, manage bandwidth across a variety of service types, and process priority traffic at wire speed.
The network-centric computing model of the Internet economy implies an intelligent network, but the brains are not evenly distributed throughout the access, edge, and core infrastructures. In the long-haul network, traffic has been sorted and aggregated into big SONET container ships, and the core switches route them simply by reading the equivalent of a bar code on the outside (see Figure 1). Packets are encapsulated rather than processed, and the main requirement of the infrastructure is raw bandwidth and reliability. The pipes are fat and dumb, and an emerging generation of terabit switches moves the SONET supertankers through them at light speed.
The access network requires more intelligence than the core, but it is still a relatively simple place that focuses on basic first-stage aggregation. The main purpose of access switches is to aggregate a lot of low-utilization links onto OC-3 and OC-12 (today), and OC-48 and OC-192 (in the not-too-distant future) back bone links feeding the edge. Cost is a big factor at this point, and nonblocking architectures are acceptable. True, the access pipes that feed these switches are getting faster, but they typically consist of nailed-up circuits that deliver simple bandwidth to customers. Most of the actual service provisioning takes place in the edge network, where carriers can achieve economies of scale through higher levels of aggregation.
In the container-ship metaphor, the edge switches play the role of custom freight forwarders, who receive all the packages from the access pipes and sort them according to such factors as traffic type and security requirements and the terms of service-level agreements (SLAs). Carriers need edge switches with the intelligence and bandwidth to support as many different service levels as possible.
However, service providers are being severely limited by the constraints of the typical data-communications switch today. These devices use an output-queuing model that assumes the aggregate data is passing through a single path in the switch fabric and arrives at an output port without any delay. Hence, the fabric does not have to manage the traffic flow. But every output channel has to be able to handle the aggregate bandwidth of the entire switch.
The intelligence in these switches is typically implemented through a shared-memory architecture. The switch fabric contains a global memory, and each line card can write into and read out of it. The performance of such switch fabrics is scaled by using faster memory and thus depends on the ability of the semiconductor industry to make process improvements. But while such improvements continue to be achieved as quickly as ever, they can't begin to keep pace with the growth of Internet traffic: In the 18 months that it takes to double the performance of memory chips, edge-switch bandwidth must increase by a factor of four.
Making the memory wider can provide some additional performance gains, but this tactic runs out of steam when the width of the memory reaches that of ATM cells. At this point, making the memory wider won't help. The bottom line is that architectures based on output queuing and shared memory become impractical as switch throughput moves above 40 Gbits/sec.
To address these problems, the industry developed a more intelligent switch fabric by using a crossbar architecture. Crossbar switch fabrics use space-division multiplexing based on more of an input-queuing model. There isn't a single path through the fabric that has to be able to carry the aggregate bandwidth of the switch. Instead, each of the connections through the fabric simply has to handle the bandwidth of one line card, which makes the switch much more scalable.
However, since all the traffic isn't going through a single conduit, the switch fabric has to decide where the constituent packets and cells go, and when. These decisions are made based on what types of traffic are queued up on the input side and what QoS levels are assigned to each. Such arbitration has to be handled efficiently and fairly, and somehow keep pace with the ever-increasing amount and complexity of traffic bombarding the edge network.
The construction of a super-switch by putting together a network of smaller switches is another approach to scalability. However, these multistage interconnect networks (MINs) introduce a new set of problems. There are many different paths that each bit of data can take, and choosing the right one imposes considerable overhead. Also, since there are multiple switches the data must go through to get from an input port to an output port, unacceptable latency is introduced into the data path.
The MIN architecture is also a relatively expensive alternative. The component switches have to use interconnects internally to communicate with one another, limiting the number available for provisioning outside connections. The number of pins in a chip determines the cost of silicon to a large extent, and the use of pins to talk to other pins wastes a lot of the silicon's capacity. Today, MINs are being used to build multiple terabit switches for the core network, where traffic is switched at a much coarser level. In general, MINs are commercially practical only in environments that can't be accommodated by single-stage switches.
In reality, the next-generation switch fabrics being developed by such companies as Power X Networks, IBM, and Vitesse Semiconductor are hybrids that incorporate elements from two or more of the different approaches discussed above. However, they are fundamentally crossbar switches that use an input-queuing model to establish virtual pipes through the fabric. The challenge is to ensure that these paths operate at full line speed, which requires a new breed of intelligent switch fabrics with sophisticated schedulers for traffic management.
Every switch fabric must have a brain somewhere that can look at all the traffic on all the switch ports-or as much as it can see at once-and figure out how to move the traffic through the fabric. A path is determined and reserved, the appropriate data is herded through it, and the path is released. The brain-or "arbiter," as it is more commonly called-is constantly making and breaking these connections through the fabric.
To boost performance and reduce component size and cost, some switch-fabric designers put the crossbar and arbiter functions on the same chip. But this approach has a flaw: As the number of ports in the fabric increases beyond four or eight, the combined arbiter/ crossbar chips suffer from "arbiter contention." In theory, the port count can be scaled upward by using multiples of the 4- or 8-port chips, but each of them can only see a fraction of the total ports in the switch. Consequently, the multiple-function chips have to first get each other's approval to set aside paths, and this arbitration-by-committee process introduces system overhead, latency, and contention that increases geometrically as ports are added to the switch. The design saves a chip but reduces the number of packets that can be handled at wire speed. Even more critical, true QoS can no longer be guaranteed, because the arbitration function can no longer "see" all packets across all ports in real time.
A global arbiter that can manage traffic across the entire switch fabric can solve this central-intelligence problem (see Figure 2). A new architecture with such an arbiter offers 16 prioritization levels across 16 ports today, with 32 expected by mid-2002 and 64 ports after that. The arbiter monitors all the line cards all the time at wire speed. When multiple ports send traffic through the fabric at once to the same destination, it looks deeper into the packets and prioritizes them. Traffic that can be delayed gets sent to local memory that is still on the ingress side of the switch, so it is virtual output.
By adding the functionality of the arbiter chip, service providers are getting true switch-wide prioritization performance. With such a single arbiter design, there is no chip-to-chip communication creating a lot of overhead, and the switch fabric doesn't have to deal with a subset of its ports or a subset of possible packet sizes.
The ability to handle tiny packets efficiently is critical in the Internet environment, because Internet Protocol (IP) requires switches to send a 40-byte acknowledgement packet for every IP packet that they receive. Up to 50% of the traffic on an IP network can consist of these acknowledgement packets, and while they don't need high priority, they can't be parked in memory indefinitely. They will rapidly overflow memory queues if they aren't processed constantly. The switch has to be able to arbitrate across the entire fabric every 30 to 40 nsec to keep up-something only a global arbiter can do at wire speed.
The architecture can also handle any amount of any type of traffic, which is a key advantage in edge switches. Unlike core switches, edge systems have to look at each packet that comes in, figure out what type of traffic it is, where it is going, and how fast it has to get there. All of this complex processing and switching has to take place at wire speed and deliver any required QoS.
With the increasing commoditization of bandwidth, both incumbent and newcomer service providers are eyeing higher-margin value-added services. Edge switches based on a flexible, protocol-agnostic, high-throughput architecture can be used to provision virtually any type of service or SLA. They can handle a mixture of voice and data and make sure that legacy TDM traffic moves through at a constant rate in concert with each tick of the SONET master clock.
Edge switches are electro-optical devices that must convert lightwaves to electronic signals before applying any intelligence, then change these signals back to light before sending the processed traffic on its way. Now that enterprise customers are using fiber to access all-optical metropolitan-area and wide-area networks, injection of electronics in between may seem like an unnecessary complication. In theory, all-optical switches would provide an end-to-end optical infrastructure, and first-generation products are now available.
However, these devices are in the early stages of development, and progress is slow. Also, they are being designed for use in the long-haul core, where traffic direction is relatively simple. All-optical switches typically function as big add/drop multiplexers that take a very fat stream of long-haul traffic and peel off a tributary stream at carrier points of presence. That is very primitive switching that goes on at the optical interface level, which is beneath the physical layer at the bottom of the seven-layer Open Systems Interconnect model. Traffic is routed according to labels on the outside of the 90B-by-9B SONET container ships. The individual packets are invisible at the optical or even physical layers and thus could not be processed in any case. Switches must operate at least at Layer 2 (the data-link layer) or Layer 3 (the network layer) to do any packet processing or sophisticated routing.
The manufacturers of all-optical switches are using some rather exotic bubble and mirror technologies to create what are basically programmable patch panels in carrier hotels and telco central offices. These devices eliminate the need to switch traffic manually by having a technician unplug a fiber cable from one jack on the patch panel and move it to another. Instead, this process is automated.
The all-optical technology is primarily applicable to situations in which an entire fiber-with all of its constituent wavelengths-is being switched as a unit. That's often the case in the long-haul network, but edge switches must operate on a much more granular level. All-optical switches also raise reliability issues: Many depend on largely unproven micro-electromechanical technologies, and mechanics are inherently slower and less reliable than electronics.
Even if these reliability issues are resolved, the all-optical switches are still too slow and inflexible to provide the intelligence needed to make all the complex routing decisions that characterize the edge network. Edge switches will have to rely on electronic brains for the foreseeable future.
Edge switches will have to rely on electronic brains for the foreseeable future. If the world of infinite bandwidth envisioned in George Gilder's Telecosm is to be realized, the edge network needs a brain transplant. The huge wealth of fiber capacity throughout the public-network infrastructure can be exploited only if the edge switches can process and aggregate traffic from the access network fast enough to feed the OC-192 pipes of the long-haul core.
The carrier edge is really the intelligent edge-the last point where intelligence is applied to packets before they disappear into the SONET container ships in the core (see Figure 3). Edge switches based on a crossbar-switch fabric with global arbitration will support any type of traffic the access network can throw at them and convey a big competitive advantage to service providers. Finally, they can start turning all that fiber into gold.
Bill Weir is vice president of marketing for Power X Networks San Jose, CA.