Integrated 10G processors at the core of next-gen networks

The demand for intelligent, high-speed packet processing has led to the advent of programmable network processors, which provide system flexibility while delivering the performance required to process packets at speed.

Most architectures are based on the integration of multiple RISC processors into a silicon chip, but are limited to 1-2Gbit/s at layers 2-4.

Usually, such a network processor needs many supporting chips — mainly classification co-processors plus associated memory chips external classifiers, content addressable memory (CAMs) and static random access memory (SRAMs).

A 10Gbit/s interface for metro and edge applications would conventionally require one or two network processors, 10-15 CAMs and 10-15 SRAMs to implement packet processing, with total aggregate power dissipation of 100–115W and a cost of USD4,500–5,000.

Network processors from start-ups Terago Communications and Bay Microsystems can process up to layer four (enabling look-ups on the packet header only). AMCC's processes up to layer seven (for look-ups on the payload as well) and offers a complete packet processing solution. This includes network processor, traffic manager, and switch fabric. However, it is half-duplex, needing two chips as well as external CAMs and SRAMs to process layers 5-7.

EZchip (see panel) was the fourth supplier to ship 10Gbit/s network processors, in April delivering its NP-1 10Gbit/s 7-layer network processor from IBM's silicon foundry in Burlington, VT, USA. In contrast, the NP-1 needs just separate traffic manager chips and some external 256MB DRAMs memory chips to hold the look-up tables. Also, the structure is full-duplex.

One customer has already designed the NP-1 into a system, demonstrated at Networld+Interop 2002, Las Vegas, providing a metro switch and interconnection to the IBM PowerPRS Q-64G switch fabric.

The NP-1's TOPcore architecture is based not on generic RISC processors but on integration of different types of high-speed and efficient Task Optimised Processors (TOPs), which incorporate both processing and classifying functions.

Four types of programmable TOPs are optimised for the main tasks of packet processing, i.e. TOPparse, TOPsearch, TOPresolve, TOPmodify, each employing a unique architecture with a customised, function-specific data path and instruction set, reducing the complexity of packet processing into four smaller independent tasks. This minimises the number of clock cycles for packet manipulation and provides fast 7-layer packet processing.

Performance is boosted by a super-scalar architecture, optimised for packet processing, in which TOPs are organised for simultaneous parallel and pipeline processing.

Each TOP type processes frames at each pipeline stage. Pipe-lining enables the passing of messages and pointers to packets from one processing stage to the next. Each TOP performs its particular task and passes its results to the next stage.

Multiple TOPs at the same pipeline stage enable simultaneous processing of multiple packets. TOPs of the same type execute the same code, but to maximise performance each TOP operates independently.

TOPs of each type are employed as shared resources without being tied to a physical NP-1 port. An integrated hardware scheduler dynamically schedules the next available TOP to the next incoming packet. And ordering of packets is automatically maintained.

Parallel processing at each pipeline stage is completely transparent to the programmer. Also, allocation of TOPs to in-coming frames, passing results, messages, and frame pointers from one pipeline stage to the next, as well as maintaining ordering of frames, is transparent to the programmer.

NP-1 uses multiple embedded memory cores to provide aggregate bandwidth of hundreds of Gbit/s for packet buffering and queuing and for storing look-up tables and high-bandwidth memory accesses for packet processing and classification. The memory cores are accessed in parallel by the various TOPs and provide the bandwidth required for sustaining 7-layer wire-speed throughput.

In total, NP-1's embedded TOPsearch engines, coupled with EZchip's search algorithms, enable searching through over 1m entries at 10Gbit/s wire speed for implementing diverse applications involving layer 2-4 switching and routing as well as layer 5-7 deep packet processing.

Israel-based network processor provider LanOptics has struck a deal with private equity investment group Apax Partners that will allow it to increase its shareholding in San Jose-based fabless semiconductor subsidiary EZchip Technologies (founded 1999) from 57% to 66% by exchanging Apax Partners' shares in EZchip for 1,153,508 newly-issued LanOptics shares.

"EZchip's NP-1 processor, now in production and shipping, represents a tremendous cost savings for OEMs," says LanOptics chairman Dr Meir Burstin. Allan Barkat, Managing Director of Apax Partners in Israel, adds, "The exchange will allow us to hold shares in a publicly-traded company whose future is directly linked to EZchip."