America Online ramps up IP transport network with DWDM over dark fiber
Metro dense wavelength-division multiplexing rings reduce network costs and provide greater flexibility.
Mike Runge, America Online
Bob Welch, Ericsson
America Online has built success on a service that's intuitive and easy to use. But behind the easy-to-use exterior lies a state-of-the-art network infrastructure that hosts several interactive services: America Online (AOL), CompuServe, ICQ, AOL Instant Messenger, AOL.COM, Netscape Netcenter, Digital City, Spinner, WinAmp, Moviefone, and When.com.
The infrastructure consists of five primary data centers: three in northern Virginia, one in Columbus, OH, and one in Mountain View, CA. Together, these data centers house in excess of 20,000 servers. The Table provides key statistics on AOL's production network.
With traffic between the network's data centers growing rapidly, especially for the large complexes located in northern Virginia, AOL recently deployed an optical transport network using DWDM. AOL selected the metro DWDM solution from among several alternatives and deployed it in the past six months. Its operation has been successfully tested in the production environment and represents a novel approach, especially given the ring sizes and the number of channels already in use.
Dubbed AOLwave, the network interconnects the data centers and links those sites to AOL points of presence (PoPs) within primary Internet exchanges in the northern Virginia area. The majority of the traffic is packet-over-SONET OC-12c (599.04-Mbit/sec) and OC-48c (2.4-Gbit/sec) circuits. Other traffic types such as Gigabit Ethernet and Fibre Channel are in service or planned shortly.
Before AOL implemented DWDM over dark fiber, the Internet service provider was completely reliant upon circuits provided by carriers. A number of problems resulted from this approach, however:
- AOL experienced difficulty getting high-bit-rate services from carriers.
- The company encountered problems adding new services and changing existing ones in a timely manner.
- The existing transport hardware took up too much space and consumed too much power.
- The price of data transport was too high.
It was clear that AOL needed a new, scalable solution. Increased member usage of the AOL service and the addition of new, popular, high-usage brands like Spinner were driving the need for more bandwidth between data centers.
The transport solution had to reduce network cost and complexity. It had to be cheaper than existing services and allow for the integration of new services using protocols other than SONET. It also had to support frequently changing bit rates-from OC-3 (155 Mbits/sec) to OC-12 (622 Mbits/sec) to OC-48 (2.5 Gbits/sec). Therefore, the system needed to be bit-rate- and protocol-transparent.
It also had to support large ring sizes. Some AOL fiber spans were in excess of 70 km, which excluded many vendors. The total ring size was greater than 140 km.
To increase network space and power density, AOL required a high channel-count-to-floor-tile ratio. The transport solution also needed to provide optical protection to guard against fiber cuts.
Finally, it had to fit within AOL's existing management paradigm. Because most of AOL's network staff is oriented toward data communications, AOL was looking for a solution where employees could use their IP engineering skills with only a minimal amount of training. Thus, the configuration management, fault isolation, and performance management needed to be Internet-centric and support command-line interface (CLI), syslog, simple network-management protocol (SNMP) trap, SNMP poll, network time protocol (NTP), etc.
After examining several economic models and available technologies, AOL decided that the expansion of SONET rings-whether carrier supplied or internally built-was cost prohibitive, and the inflexibility of reprovisioning the network to handle internal needs was deemed unacceptable. A planned four-fold expansion of router interface speeds, from OC-12c to OC-48c, would have also required OC-192 (10-Gbit/sec) SONET configurations to handle large increases in bandwidth. A significant increase in floor space and power would be necessary to support this expansion using SONET.
AOL ultimately decided to deploy stacked DWDM rings to connect the northern Virginia data centers and Internet exchange PoPs. This solution could easily scale to handle planned bandwidth needs of almost any existing bit-rate or protocol format. It was also very dense, saving precious floor space. A traditional OC-192 SONET solution would require several floor tiles of space and only provide four OC-48 circuits. In comparison, using DWDM, 16 protected OC-48 circuits fit in one floor tile (19. in wide by 7 ft high).
AOL tested alternative suppliers in early 1999 and ultimately selected Ericsson's ERION metro-based 32-channel DWDM solution for the project. The cost differential between leased dark fiber and DWDM technology versus leased SONET transport services was sufficient to easily cost justify the DWDM purchase. The system was initially deployed in November 1999 and has been carrying traffic between data centers and PoPs since then (see Figure).
By employing metro DWDM on top of leased dark fiber, AOL has reduced normal provisioning times of six to eight weeks or even longer for new transport connections between data centers, including OC-48c, to less than one day. This reduction has proved beneficial in integrating services such as Netscape and Spinner into the network seamlessly. It is also easy to provide additional bandwidth for new applications despite difficulty in forecasting those capacity needs.
Due to the transparent nature of the transponders, the AOLwave network enables a planned four-fold increase in router interface speeds to OC-48c without any major modifications to the transport network. This increase allows for an easy upgrade, without a corresponding increase in cost or floor space to support additional transport gear. Additionally, expansion to support distributed data storage via Fibre Channel over DWDM is possible. LAN extensions of corporate networks for AOL offices using Gigabit Ethernet have been seamlessly integrated into the existing network, something not possible using SONET transport between locations. The incremental cost of adding new channels is very attractive and allows AOL to leverage the system to provide services to other areas.
The ability to provide fully protected traffic between data centers was a mandatory requirement. The DWDM system promised automatic optical protection through its patented FlexRing mechanism, which keeps standby amplifiers in a glow state. When needed, these amplifiers cycle up in 10 to 30 msec, depending on the length of the ring, to automatically reroute traffic through the secondary path. AOL tested this feature in the lab before deploying the network.
Proof of the importance of automatic optical protection occurred recently: One of AOL's dark-fiber suppliers suffered a fiber cut, which could have affected a sizable portion of AOL's traffic. The DWDM system detected the fault and automatically switched over to the backup path without affecting any of the services. The only indication that anything had happened was a message logged via syslog to AOL's network-operations center. AOL's dark-fiber provider restored the link later that day, but the automatic protection on the DWDM system was a vital component in assuring fault-free operation.
Network management at AOL is IP-centric. Therefore, the system's interface was modified to support AOL's specific needs. Like many other service providers, AOL's Internet operations use a CLI for configuration management and maintenance and syslog and SNMP for fault and performance management. A relatively small staff operates AOL's entire IP data network using a common management methodology built upon this model. Since the model is built around a text-based CLI, it is easy to scale network management through the use of programs written in scripting languages such as practical extracting and reporting language (PERL), Korn shell, and Expect.
By retaining the CLI orientation, any authorized user can operate any component of the network from any location. AOL is able to manage the network using its data-communications orientation, instead of adopting the telecommunications approach commonly used in most transport implementations.
Low-cost bandwidth made possible via DWDM has also allowed AOL to remove some of the traditional constraints on server placement. Affordable, high-bandwidth connections enable the splitting of services such as backup and mass disk storage across sites. This flexibility results in savings far beyond direct network expenses, allowing the business to operate more efficiently.
America Online has successfully deployed a metro DWDM solution to scale its intersite networks to handle current and future needs. Since the original installation, AOL has added nodes onto the existing rings and added many new channels of capacity as needs have arisen. With this transport solution, AOL lowered operating costs, improved flexibility, and continued to operate within existing management models. Future deployments will replicate this solution in other cities.
Mike Runge is a network architect at America Online Inc. (Reston, VA). He can be reached at email@example.com. Bob Welch is manager of optical networks at Ericsson Inc. (Richardson, TX). He can be reached at firstname.lastname@example.org.
Multiple data centers (larger than six football fields of raised floor); eleven U.S. domestic and seven international points of presence; direct Internet peers: more than 45; internet bandwidth: >20 Gbits/sec
AOL peak simultaneous users: 1.6 million ; AOL members: over 22 million; AOL member time online/day: average 64 min
E-mail messages delivered per day: 120 million
Web traffic, URLs per day: 5.6 billion; Web traffic, sustained peak rate: 104,000 URLs/sec; Web speed: fastest measured by inverse 14 of past 15 months
Average measured Internet packet loss: <1% (domestic and international); Average measured Internet latency: <65 msec (domestic and international); Network availability over any 30-day period for the last 12 months: >99.995%
Source: America Online Inc.