Effective troubleshooting of FCoE and iSCSI at 40G

The analyzer is a key tool for Fibre Channel over Ethernet (FCoE) and Internet Small Computer System Interface (iSCSI) developers designing reliable and efficient 40G systems. Users have increasingly begun to turn to hardware-based analyzers for accurate, efficient FCoE and iSCSI troubleshooting.

Iscsi Multi Pdu Screenshot

The most important factor in effective troubleshooting is being able to trust your data. When your data is wrong, you have to add a step to the debugging process: determining whether a problem is real before you dedicate resources to trying to resolve it.

For example, consider a switch with four 10G ports aggregated to a single 40G port. An analyzer that can capture data on the 10G ports may not be able to maintain line rate on the 40G port without dropping any packets. Suddenly it looks like the switch is dropping packets when it’s really a shortcoming of the analyzer.

The analyzer is a key tool for Fibre Channel over Ethernet (FCoE) and Internet Small Computer System Interface (iSCSI) developers designing reliable and efficient 40G systems. Software analyzers, also known as sniffers, modify the protocol stack to intercept data. This is an intrusive process, given that the analyzer runs on the same CPU that is passing traffic, thus affecting performance and reliability. A network interface card (NIC), for example, has only enough capacity to perform the tasks for which it was designed. When CPU cycles are diverted to capture and analyze traffic, the NIC will no longer be able to operate at wire speed, resulting in congestion and dropped packets.

Throughput can be a problem for software-based analyzers even at less than line rate. For example, the CPU has other tasks for which it has reserved memory and system resources like storage bandwidth. As a result, software analyzers cannot guarantee available processing cycles or memory bandwidth, and packets may have to be dropped even when the device under test is operating with light traffic loads.

For these and other reasons that we’ll discuss below, users have increasingly begun to turn to hardware-based analyzers for accurate, efficient FCoE and iSCSI troubleshooting.

Packet loss

Phantoms in the network

“You have to have a reliable analyzer that you can trust, or you may find yourself in serious trouble trying to solve problems that aren’t really there.” Lou Dickens, protocol test engineer at a Fortune 100 company, talks from experience. He analyzes Fibre Channel, iSCSI, and FCoE traffic on a daily basis, depending upon what he is currently testing.

“When packets are dropped by your analyzer, you can find yourself trying to solve a phantom problem that isn’t really there.” For example, if the analyzer drops the packet that indicates an exchange is complete, it appears as if the completion packet was never sent. “Now you’ve got a false target issue pulling your attention away from the real problem,” he explains.

Lost packets can manifest as a variety of intermittent problems that are difficult to repeat consistently and even more difficult to resolve. “You can lose weeks tracking a phantom problem through the network trying to find the source. It’s even worse when you’re working as a team because everyone has a different idea of what’s gone wrong,” Dickens adds

Bad data can lead to other problems as well. “One time the team replaced all the cables trying to fix a problem and, in the rush, put a bad one in which created a whole new set of problems that weren’t there before,” Dickens says. Even worse than wasting time debugging phantom problems is fixing them. “Your analyzer dropping a packet can lead you to the wrong conclusions and the wrong fixes. Now you’ve got another problem to clean up after,” says Dickens.

Reliability is even more important in the field. “When I’m troubleshooting at a customer site, their IT department has to approve any additions to make sure they aren’t going to bring down the network,” Dickens states. “In one case, it took a month just to get an analyzer in place. If we had to deal with phantom problems as well, we would need to request moving the analyzer throughout the network. You’d be surprised at how fast six months can go by.”

For these reasons, “we don’t use software analyzers because we can’t afford to be sidetracked like that,” Dickens asserts. “We have found over the years, hardware is far more stable than software, as a general rule.” Dickens has experienced firsthand the cost of product delays caused by analyzer-induced phantom errors. In one case, product shipment was delayed at the cost of thousands of dollars per day.

Another time it was a field problem with huge visibility that could affect the company’s reputation. “We had 30 people tied up trying to solve a problem that wasn’t even there,” Dickens recalls. “That’s why it’s so critical that the information you act on is accurate. If it isn’t, you can chase your tail for months.”

For lossless protocols like Fibre Channel (FC) and FCoE, packet loss cannot be tolerated at any level, so software analyzers are simply not an option in FC applications. However, dropped packets are a potential problem even for lossy protocols like Ethernet and iSCSI (see the sidebar “Phantoms in the network”). With iSCSI, for example, iSCSI messages can be striped across multiple packet payloads, and the analyzer must follow every packet through a particular connection to keep track of where the iSCSI headers are. If the analyzer drops a packet, there won’t be a retransmission as there is when the link drops a packet. The loss of even a single packet could prevent reassembly of messages, and analysis tools will lose the ability to visualize what is passing over the link. Thus, even though iSCSI is a lossy protocol, analysis of iSCSI must be lossless.

A hardware analyzer works in-line to traffic. A 1:2 splitter serves as the front end, passing traffic to its destination while a copy is passed to analyzer hardware for processing and storage. Hardware analyzers are able to guarantee 100% capture at line rate because they perform their function in dedicated hardware rather than load and impede the device under test. When they are also non-blocking, traffic passes through the analyzer transparently without materially affecting network operation. Other than minimal latency – delay through a hardware analyzer ranges from a 100 ps to a few nanoseconds – it’s as if the analyzer isn’t there. However, this also means that because the analyzer is just listening, it has no control over traffic or the ability to apply backpressure.

To achieve non-blocking functionality, a hardware analyzer must have sufficient memory bandwidth to store not just packets but the metadata associated with each packet, including timestamps and error flags. To prevent dropped packets, the system must also have enough memory to sustain throughput during worst-case traffic conditions.

Visibility

With a hardware analyzer, developers can see the raw 66-bit scrambled signal on the line. This means they can set triggers on any aspect of traffic, including primitives and packet order, not just at the protocol layer. Triggering is non-blocking as well, even with multi-stage triggers, since dedicated state machines in hardware are used. Timestamp accuracy is also improved since triggers are resolved quickly and do not add overhead as is the case with software-based approaches.

Software analyzers have only limited access to what is actually passing over the wire. For example, low-level primitives are not accessible at the stack level because they have already been stripped off by the NIC. The timing of packets is affected too, since there is a delay between when the packet reaches the NIC and when it is passed to the protocol stack.

Visibility can be impaired in the other direction as well. For example, a software analyzer cannot probe beyond the NIC into a switch. This means that developers cannot observe what is happening when 10G ports are aggregated into a 40G port. In contrast, hardware analyzers can provide visibility at every point in the traffic chain, from both before and after the NIC to the server/switch and through aggregation points. A single analyzer with multiple ports can also verify traffic in and out of a target and automatically correlate and compare results.

A single hardware analyzer is also able to support multiple protocols at different speeds. This gives developers greater flexibility, as well as allows them to leverage tool investment across multiple applications. The ability to use a single tool to debug multiple protocols also enables seamless correlation of traffic across ports and protocols in a way not possible when separate analyzers are used. This is critical for analyzing traffic that crosses domains, such as FCoE.

Advanced troubleshooting
Just having access to captured data, however, is not enough to always know what is stored in a packet. For example, TCP delimiters, also known as packet data units (PDUs), can be spread across multiple packets. To be able to analyze traffic, developers need to be able to identify where PDUs are and aggregate them. However, if there is a retransmission of a packet, at 40G that packet may be hundreds of millions of events further down the trace. Manually reassembling multi-line PDUs into a single packet is a tedious and error-prone process. In addition, ambiguities can arise as to whether the entire packet was lost or just a single PDU.

When the analyzer can handle processes like this automatically and without packet loss, it can save tremendous effort on the part of developers. To ease troubleshooting, all relevant data can be seen within a small window that shows traffic as it was transmitted (see Figure 1), enabling each PDU to be verified as well as shown with the data with which it is associated. In addition, a huge buffer is not longer required as only data of interest needs to be captured.

Iscsi Multi Pdu Screenshot
Proper presentation of test data can speed analysis and trouble resolution.

Two key capabilities for enabling efficient data capture and debugging are multi-state triggers and enhanced filtering. Consider that even though a session may run over a short time, at 40G a lot of data passes through the system. Triggers and filters pare down the amount of information that the analyzer captures and that developers need to sift through to resolve an issue.

For example, developers can more clearly define when to begin capturing traffic by configuring triggers to very specific conditions. Enhanced filtering complements multi-state triggers by automatically removing traffic that is not of interest, such as data to devices that are not being debugged. As a result, the trace buffer required to capture even a simple connection will contain just traffic that is relevant rather than spanning gigabytes of data.

Flexibility in analyzer output is important as well; if captured data is locked to a proprietary tool, developers cannot take advantage of their offline analysis tools of choice. Analyzers that enable export of data to tools like WireShark that support custom analysis enable teams to leverage their existing test benches.

Measurement of latency and performance verification in 40G systems requires, accurate timestamps as well. Today, hardware analyzers offer timestamps with accuracy to 1 ns. This capability enables developers to identify potential bottlenecks, analyze flow control issues, accurately measure port-to-port delay, and even explore low-level interactions at the LINK and PHY layers that can affect performance.

Non-intrusive probing
Variations in how long it takes to pass traffic through the analyzer introduce unwanted jitter to traffic. With software analyzers, high-priority host tasks like interrupts and locked access to storage or memory can increase latency. Advanced triggering and filtering, if supported, will add delay as well. In addition, because triggering introduces loading on the CPU, software analyzers do not support the complex multi-state triggers developers need to identify and locate performance and reliability issues.

Hardware analyzers avoid introducing jitter by minimizing added latency. Because processing is performed by dedicated hardware, this latency is also deterministic and consistent, eliminating any issues that might arise from jitter. The result is little to no impact on traffic.

To be completely non-intrusive, hardware analyzers also need to redrive signals rather than retime them. Retimers have a potential to materially alter traffic by adding or removing symbols when passing the traffic through. A redriver, in contrast, retransmits traffic without retiming what was received from the optics and so preserves traffic integrity.

The sheer amount of data to analyze brings unique challenges to designing 40G systems. Even though the FCoE and iSCSI protocols tolerate packet loss, packets dropped by an analyzer can create phantom problems that distract developers from real issues, potentially causing costly product delays. Through deep visibility and 100% data capture at line rate, developers can effectively troubleshoot systems to ensure optimal performance and reliability.

Simon Thomas is product manager at Teledyne LeCroy.

More in Design & Manufacturing