Prospects for 40G and 100G in the data center improve

Aug. 8, 2012

9 min read

Forty-gigabit/second interconnects in the data center are poised to take off as all the various server, switch, cabling, and transceiver parts have finally come together.

After about a six-month delay, Intel finally released its next-generation "Romley" architecture that offers 10 cores per microprocessor and a PCI Express 3.0 bus that supports faster I/O. With the servers and switches therefore ready to go, 40G interconnects using direct attach copper (DAC), active optical cables (AOCs), and optical transceivers should be in demand as data center infrastructures begin a major upgrade cycle.

10G faces many issues
Once the server upgrades, faster uplinks to top-of-rack switches are needed. But the 1G-to-10G transition is fraught with issues. In the past, server suppliers included Gigabit Ethernet (GbE) RJ-45 LAN-on-motherboard (LOM) for "free" -- but a dual-port 10GBase-T today costs far too much for such largesse. Meanwhile, with Cat5e almost free as well, the interconnect was never a serious cost issue. Now it is.

Server companies offer 10G ports on pluggable "daughter cards" that block out aftermarket competitors and ensure high prices. Daughter cards come in different flavors of 1G and 10GBase-T, two to four SFP+ ports, or dual QSFP with a path to 100G CXP and CFP/2 in the future. As server manufacturers are making a lot of money on the 10G/40G upgrades, this begs the question, "Will server companies ever return to the LOM model where buyers consider it a freebee?"

Meanwhile, 10GBase-T has had problems with high power consumption, size, and cost. This has left the door open for SFP+ DAC cabling to move in while 10GBase-T suppliers build 28-nm parts. This event changed the entire industry. But DAC has its share of issues too, as it "electrically" connects two different systems together, and not all SFP+ ports are alike.

LightCounting estimates that 2012 will show about 1 million 10GBase-T ports actually filled, representing about 500,000 links -- almost what can be found in a single large data center today with 1GBase-T! SFP+ DAC demand is shaping up to be about 2.5-3.5 million ports filled, mostly to link servers to top-of-rack switches at less than 7 m.

On the optical end, SFP+ AOCs are on the near-term horizon, and optical transceivers are typically being used to link switches together over reaches greater than 7 m. LightCounting forecasts about 6 million 10G SFP+ short-reach (SR) and long-reach (LR) optical transceivers will ship in 2012.

40G the "next big thing"
Upgrading server-switch links from 1G to 10G forces switch uplinks that connect top-of-rack to end-of-row and aggregation switch layers to jump to 40G. However, as data center operators emerge from the economic recession, budgets are still very tight and "incremental upgrades" are the way operators are buying. Adding 10G/40G links "as needed" is the current buying practice.

While 100G seems to get all the trade show and press coverage, 40G is where the money is for the next two to three years. Data centers are just hitting the need for about 4G to 6G, never mind 10G; so many data centers are in a transitional, upgrade-as-needed state. The so called Mega Data Centers at Google, Facebook, Microsoft, etc., at $1 billion a piece, do not represent the mainstream data center -- although they garner a lot of attention and awe.

Chasing the transceiver opportunity 40G will present, multiple transceiver suppliers have jumped at offering 40G QSFP SR transceivers and Ethernet AOCs for applications of less than 50 m. Over 10 transceiver companies have announced transceivers and/or AOCs, and more suppliers are coming! Technical barriers to entry are low, and cost-sensitive Internet data centers (especially in China) are likely to gobble these up in volume.

40/100G transceivers: An introduction
Optical modules for 40G and 100G have two main "flavors" in the data center: short reach (SR) for ~100 m using multimode fiber and Long Reach (LR) for 100 m to 10 km using singlemode fiber.

SR transceivers are typically used to connect computer clusters and various switch layers in data centers. Several SR transceivers can reach ~300 m with OM4 fiber, but somewhere between 125 and 200 m the economics of the fibers and transceivers justify converting to singlemode optics -- and at even shorter distances for 25G signaling. Data rates of 40G are typically deployed as four 10G lanes using QSFP or CFP MSAs transceivers. SR modules use eight multimode fibers (four for each direction), VCSEL lasers, and typically a QSFP MSA form factor. LR4 uses edge-emitting lasers and multiplexes the four 10G lanes onto two singlemode fibers capable of 10 Km reach within either QSFP or CFP form factors. Both SR and LR4 QSFPs can be used in the same switch port without any issues - just plug and play, 1 m to 10 km no problem.

But this is not so for 100G. Modules for 100G SR applications use 20 multimode fibers, VCSELs, and typically the CXP MSA form factor. Although specified to 100 m, these modules are typically used to link large aggregation and core switches at less than 50 m, as 20 multimode fibers become very expensive, very fast as the reach gets longer; multimode fiber is about 3X more expensive than singlemode fiber.

Only in 2012 have multiple transceiver companies started to unveil CXP 100G SR transceivers, whereas the 40G QSFP transceivers and AOCs have been available since about 2008.

LightCounting expects the transceiver industry will do its traditional pricing act of "Let's all cut our own throats on price and see who bleeds to death last." As a result, 40G SR parts are likely to see a very rapid price drop from about $250 today to under $190 for fully compliant OEM offerings next year. We even have seen "plug and hope they play" parts at $65. (But you get what you pay for!) OEM prices for Ethernet AOCs can be found below $190 -- and that is for a complete link with both ends and fiber!

The 40G QSFP MSA uniquely supports SR at approximately 100 m with multimode fiber or 10 km with duplex, singlemode fiber - all in the same QSFP switch port. Companies such as ColorChip, Sumitomo, and a few others offer QSFP parts and Oclaro (via its merger with Opnext), NeoPhotonics, Finisar, InnoLight, etc., offer larger CFP devices. QSFP enables 36 ports per line card compared to only four with CFPs. Running at 32 W, at LightCounting we affectionately refer to the CFP as the “Compact Frying Pan”; although popular in telecom, it is not in datacom! OEM prices range from $2,000 to $3,000 depending on data center or telecom features.

Implementing 100G is much more complex
Much noise has been made at industry conferences about the imminent need for tens of thousands of 100G medium-reach links in the data center to support the upcoming "exa-flood" of traffic from server virtualization, big data, smartphones, tablets, and even software-defined networking. Ten-channel CXPs are used for multimode primarily by the large core switching companies in both transceivers and AOCs. At 25G signaling for 4x25G, multimode noise spikes threaten to decrease the reach of multimode transceivers to 50-70 m; therefore, these modules may require forward error correction (FEC) and/or equalization to reach 125 m.

The 100G 2-km problem
Sending 100G more than 100 m has proven frustratingly hard to implement and longer to develop than first expected. The IEEE 40/100G High Speed Study Group met in July and extended its study another six months to deal with the technical issues.

For longer reaches, engineers are wrestling with trying to fit all the optics and electronics into new MSA packages and hit all the power, size, electrical, and optical specs required. This goal is achievable -- but at what cost and power is still an open issue. CFP/2 is not a given! Much debate still centers on zCXP vs CFP/2 for the next MSA, with Molex and TE Connectivity backing zCXP. Meanwhile, silicon photonics companies such as Luxtera, Kotura, and LightWire/Cisco claim to be able to fit all the necessary optics and electronics into a QSFP!

It's very important for the IEEE to get the 25G-28G line-rate specifications right as it is a unique convergence point for a number of protocols: InfiniBand EDR at 26G, Ethernet at 25G, SAS 4.0 at 24G, Fibre Channel 28G, and telecom at 28G.

Today, there's no economically viable option for 100 to 600 m, and as data centers become bigger, this is a hot area and the center of debates within the IEEE community. One extra meter can bump the transceiver OEM cost from a CXP at $1,000 to a telecom-centric CFP at $14,000! Often referred to as 2 km, the reach for this application actually translates to between 400 and 600 m with an optical budget of about 4-5 dB in a lossy data center environment with patch panels and dirty connectors. To go 10 km, the link will need 6 dB. Next-generation lasers and CMOS electronics instead of SiGe are on the way, but Mother Nature just keeps getting in the way of our industry PowerPoint slides!

Conclusion
The next few years will involve Intel's Romley server architecture and subsequent silicon shrink, PCI Express 3.0, 10G uplinks to the top-of-rack switches, and 40G uplinks in the switching infrastructure. Supporting links at 40G will be where the money is for the next three years, but everyone can see that 100G will be the next stop -- with mid-board optics as well. IEEE will sort out the technical issues, and 100G infrastructure technology should kick in with volume in late 2014. It is important for the community to get this right as 100G will be around for a very long time.

Brad Smith is a senior vice president at LightCounting.com, a market research company forecasting high-speed interconnects. This article is an excerpt from the soon to be released report, "40G & 100G Interconnects in the Data Center."