Designing optical integrated circuits for high-speed data links

Steve Vandris

High-speed serializers and deserializers are critical building blocks for optical networks. Design options for these broadband high-frequency circuits include potential enhancements as system needs force higher data rates.

The development of converged voice/data networks has created a strong need for optical integrated circuits (ICs) that can transmit and receive large quantities of data at serial data rates from 622 Mbit/s to 10 Gbit/s and eventually 40 Gbit/s. Current applications require the transmission and reception of point-to-point, non-return-to-zero (NRZ) data for tens of centimeters on printed circuit board backplanes for chip-to-chip communications or up to hundreds of kilometers over fiber for long-haul SONET/SDH transport.

Though specific needs vary by application, the basic requirements remain the same. Transmission of NRZ data requires the serialization of data bits with exact timing control, and reception of data bits requires deserialization with inflexible timing recovery. These functions therefore oblige the use of precision oscillators and clock/data recovery (CDR) circuits. The trade-offs between complexity, speed, power, and timing jitter, as well as pertinent architectural and circuit-level implementation, are all important considerations.

Depending on the location of the transceiver in the network there is a hierarchy of necessary speeds and timing accuracy for related electronic components. The long-haul transport layer operates at speeds up to 10 Gbit/s (OC-192) and transmits/receives data according to the stringent SONET/SDH standards. But the transmission of high-speed serial data between chips and shelves of equipment at up to 3.125 Gbit/s is governed by Infiniband, XAUI, and related standards.

PHYSICAL LAYER DEVICES
The architecture of physical layer devices is reflected in serializer, CDR, and deserializer ICs:

Serializers. The transmission of NRZ data at rates greater than 1 Gbit/s starts by taking multiple low-speed channels and merging them into a single high-rate NRZ bitstream. Most schemes use a multiplexer (mux) after a parallel load register to choose which bit from the register is put in the output line. Control of the mux can be provided from a counter driven from the full-rate clock or with multiphase clocks from a voltage-controlled oscillator (VCO) operating at a fraction of the full-speed data (see Fig. 1).

To carry out the serialization, a high-speed low-jitter clock is required. Typically, the high-speed clock is locally generated, on chip for example, with a frequency synthesizing phase-lock loop (PLL), also known as a clock multiplier unit (CMU). The CMU must have extremely low timing jitter to maximize the opening of the transmitted data eye. Clock jitter of around 1-ps rms is required for 10-Gbit/s serializers; future 40-Gbit/s systems will need to accommodate 250-fs rms requirements.

The most straightforward architecture for the serializer uses a full-speed clock (such as a clock frequency equal to the data rate). The advantage of a full-rate implementation is that all bits are transmitted using only one edge (such as rising edge) of the full-rate clock source. This architecture type allows all transmitted bits to have a duration equal to the clock period, eliminating a potentially large cause of deterministic jitter.

This approach is used in the highest-performance designs, where there are few power limitations and the technology has the speed capability. Deterministic jitter occurs periodically because of certain data patterns, crosstalk, or mismatches in circuits or signals. Random jitter, on the other hand is caused by thermal noise and other nonperiodic or non-data-dependent sources.

In situations in which power lowering is required and the technology cannot support a full-rate clock, serialization can be carried out with a multiphase low-frequency clock source. Although a low-frequency clock has multiple phases, the timing accuracy and random jitter requirements remain unchanged from the full-speed condition. However, the up side of this approach is that the PLL that generates the clock phases can run at reduced speed, accommodating lower power dissipation.

The best compromise between speed and performance could be to use both edges of a half-rate clock to execute a final 2:1 multiplexing. In a half-rate architecture, the mux output consists of odd or even bits. Even bits drive the buffer when the clock is high, and odd bits drive the buffer when the clock is low. The final retiming flip flop can be eliminated.

Clock/data recovery and deserializers. The majority of circuits that recover clock timing and retime the NRZ data are basically attempting to sample the input signal at the center of the bit interval. Clock recovery is then performed by comparing the phase of the data transitions to an on-chip VCO. Full-speed clock recovery architectures at 10 Gbit/s in other high-performance technologies such as silicon germanium (SiGe) and gallium arsenide (GaAs) have been well established; however, implementing serial transceivers in complementary metal oxide semiconductors (CMOS) at speeds of 2.5 Gbit/s and higher requires new techniques (see Fig. 2).

In a clock recovery and demultiplexing (demux) architecture utilizing lower-speed circuitry with parallelization, phase comparisons of the N clock phases are performed vs. the data transitions and averaged before moving to the loop filter that controls the VCO. Then the data is deserialized immediately with the N clock phases and N flip flops.

This architecture has the clear advantage that the VCO can operate at a small percentage of the data rate, and the clock-to-Q delays of the retiming flip flops can be lengthier than a bit period. This approach' s foremost disadvantage is that disparities in the clock phases and flip-flop setup and hold times can make different bits be retimed marginally offset from the center of the data eye (see Fig. 3).

Finally, in a digital clock recovery scheme, a multiphase VCO is locked to a reference clock, and the VCO frequency remains constant even if the average frequency of the data varies. A phase detector compares the data transitions to the selected clock phase and integrates the phase errors in a digital filter, often a simple up-down counter.

Any net phase error causes a new clock phase to be selected. If the frequency of the NRZ data is not equal to the reference clock, the phase selection algorithm will switch from one VCO phase to the next. The principal advantage of this approach is that the clock recovery mechanism is digital, usually yielding a smaller footprint and lower power (see Fig. 4).

Jitter transfer and tolerance are two significant considerations in receiver design. High jitter tolerance means a receiver can recover data bits even if the data transitions diverge considerably from their nominal position. Jitter transfer is the proportion of the retimed output data jitter vs. the input jitter data as a function of frequency.

HIGH-SPEED CIRCUIT DESIGN
Except for the PLL in the serializer or deserializer, the circuits employed are digital, but only superficially. At high speeds, whether in CMOS, SiGe, or other technologies, each gate is a source of timing jitter due to intrinsic device noise or, more frequently, to crosstalk via the substrate or power supplies. For example, the output of a single CMOS static digital inverter can easily have 20 ps of jitter even though the input is jitter-free. Therefore, each digital gate in the serializer or deserializer must be viewed as an analog circuit.

To maximize immunity to power supply and substrate noise, fully differential topologies are almost always used in high-performance designs. On-chip voltage regulators are often used in conjunction with differential topologies for especially critical circuits such as the VCO.

Voltage-controlled oscillator. The VCO is a critical building block in both the CMU of the serializer and the CDR on the receive side. The inherent jitter, or equivalently phase noise, of the VCO in the PLL is the fundamental limitation to performance in the transmitter or receiver, so its design is critical. Two basic structures of VCOs are a four-stage differential ring oscillator and an LC-resonator-based oscillator. Ring-oscillator-based VCOs generally exhibit higher jitter than LC resonators because there are no high Q circuits to attenuate the thermal and flicker noise of the transistors. In contrast, LC-based oscillators have better jitter performance that improves with increasing resonator Q (see Fig. 5).

Input and output buffer. The purpose of the input buffers is to amplify the incoming high-speed serial data to a level that can be processed by the following clock and data recovery stage. The circuit should exhibit sufficient gain and bandwidth but also a fairly linear phase vs. frequency characteristic. A nonlinear phase response will cause group delay distortion, which introduces data-dependent jitter or ISI into the data stream.

The problem can be severe in applications such as SONET where the data can have long run lengths (up to approximately 72 consecutive 1s or 0s). A direct coupled buffer is preferable to a capacitively coupled one since the former does not have run length restrictions.

A differential pair with resistive loads is often used as an output buffer because such a current-mode logic (CML) is power-efficient and fast. Ideally, the load resistors should be equal to the characteristic impedance of the transmission medium to eliminate reflections and can be made either programmable or automatically tunable to provide optimum matching. The parasitic capacitance at the output node combined with the load resistance sets an upper limit to the achievable rise/fall times.

Electrostatic discharge (ESD) protection diodes, particularly in CMOS, can add significant parasitic capacitance that compromises the bandwidth and reflection coefficient. Fortunately, parasitic capacitance at the output nodes can be partially compensated for by adding inductors to the load devices.

Input/output terminations and packaging. The amount of data that can be transported between chips or boards is limited by the bandwidth of the pin electronics, the interconnection medium, and the type of signaling used. For example, attempting to transport 3.125-Gbit/s CMOS logic signals across a backplane will result in severe ringing due to reflections in the unterminated transmission line.

To achieve the high data rates required, point-to-point serial links require well-terminated transmission lines. Terminating resistors are placed internally or externally to the chip to provide impedance matching at both ends of the transmission medium.

The package type and design have a considerable impact on the performance of the link. The skew between differential traces should be kept small to minimize any common mode-signal generation that will potentially generate electromagnetic interference. Furthermore, the traces along with the inductance of the bonding wire (in the case of wire bonded devices) should appear as 50-Ohm transmission lines to minimize reflections at the board interface. Finally, it is extremely beneficial to provide multiple power and ground planes in the package to isolate the quiet supplies from the noisy ones.

Steve Vandris is director of transport marketing, Multilink Technology, 300 Atrium Drive, Second Floor, Somerset, NJ 08873. He can be reached at [email protected].