Measuring latency in equity transactions

We’ve all heard the phrase “time is money.” Today’s evolution of data centers to distributed and cloud computing has become a key factor in network design, especially when it comes to financial markets – where “time” is literally “money.” While key to profitability for financial networks, service and network latency is becoming a critical factor in the design and management in data centers of all types.

The meaning of “latency” differs depending on the area of application. In electronic networks, it refers to the time between two events. For example, latency can be the time between the first bit of a packet entering a network switch at one port and the first bit appearing at another port. This time can be on the order of 10 ns. Latency can also be the time between the start of the transmission of a market trade and receipt of the complete trade acknowledgement packet. This time can be on the order of 100 ms. It can even be the time between a packet leaving New York City and arriving in London. This time can be on the order of several hundreds of milliseconds.

But why is latency important? In financial transactions, latency affects the time between the initiation of a transaction and its execution. For example, when Facebook went public in May, NASDAQ experienced latency, leaving several investors wondering if their transactions had been executed. Now, NASDAQ OMX Group is looking into setting up a $40 million fund to compensate those who lost money in Facebook’s IPO. This is a perfect example of how any additional latency can literally cost money.

A related term, “jitter,” is the amount of variation in latency. An exchange may have an average latency of 100 ms, but if its jitter is 500 ms or more, then it will be viewed as less dependable than systems with less jitter. Sensitivity to latency and jitter varies by the type of data being transmitted. Voice over IP (VoIP) traffic, for example, has a very low bandwidth requirement (less than 56 kbps) but is very sensitive to latency. If more than 50 ms of latency occurs, the human ear immediately hears the delay. Streaming video applications have a larger bandwidth requirement (about 500 kbps), but are far less sensitive to latency and jitter due to various buffering techniques in the transmission and reception networks and devices. Streaming video can tolerate up to 5% loss and up to 5 s of latency.

What causes latency?
Latency is a fact of life, due mostly to the number and types of devices involved in a transaction. Data center, metro, and wide area networks all generate delay due to the time required to send data over a distance and through networking devices such as routers and switches. Additional latency is caused by the processing of requests by computers and their associated storage media. Trading latency can generally be broadly broken into transmission, networking, and computational delays:

Transmission delays are associated with transmitting light over fiber. The latency of any connection is determined by physics; fiber-optic network transmission occurs at approximately two-thirds of the speed of light (due to imperfections in the fiber), which equates to 5 µs of latency per kilometer. Yet a look at the numbers and their variation indicates that there’s more going on than the speed of transmission. The distance between Los Angeles and New York, for example, is about 4000 km – which should equate to a light transmission delay of 20 ms. Yet this is less than half of the 68-ms latency observed by AT&T. This is where the other causes of latency come in.
Networking delays are caused by optical and copper cables, and networking devices. Most long connections between data centers take place over fiber- optic cables: single-mode for longer distances and multi-mode for shorter distances. Within the data center and between racks, copper cabling is frequently used for cost considerations. The transmission of Ethernet signals requires 5 µs per kilometer. Some data center connections are up to a kilometer in length. Although 5 µs is a small amount, we will see that this delay is of the same order of magnitude as that seen in some Ethernet switches. Information must go through a number of networking devices between parties. The devices vary in function, sophistication, and amount of latency that they contribute. Networking devices, primarily switches and routers, process each packet and make a forwarding decision. This process involves parsing the packet, performing a lookup in a forwarding table, potentially updating the packet (like the destination MAC, decrementing the TTL) and potentially modifying the packet (like changing the class of service or a source/destination port number). The processing time of each packet can vary greatly depending on the forwarding path and if it is performed in hardware. If there is over subscription there may also be buffering, which adds delay. Most vendors are now producing purpose-built switches with features for data center environments to minimize latency.
Computational delays are associated with processing the trades. Improving computational delays is the province the trading application developers, along with their software suppliers. In some situations where the amount of market data and trading increase dramatically, computational delays can grow along with network delays. Such scenarios include market opening and closing, breaking news, and algorithmic and high-frequency trading. These events increase computational delay, but also increase network traffic that can result in additional delays. Such events also result in jitter, which wreaks havoc on algorithmic trading algorithms dependent on specific latencies for specific markets.

Measuring latency
To improve latency, it’s essential to first understand the latencies of the network’s component systems to know where to focus on improvement. Vendor-provided specifications usually do not provide a true picture of what will happen when their devices are subject to a particular data center’s workload. Different usage models will result in different workloads for individual devices, subsystems, and the entire system. Testing is the only way to provide the means and measurements for identifying sources of latency.

Component selection and characterization testing: Characterization testing occurs when a new system is being built or an expansion is planned and component selection is a critical part of the data center design. The specifications of networking, computing, and storage components need to be verified, as those that come from the equipment manufacturers may not be realistic. To establish the proper performance parameters that accurately relate to a site’s activity, all components must be tested with real-world network traffic and applications.

Component selection testing is also essential in ensuring interoperability between components. Although network components from different vendors have become relatively interchangeable, there’s always a possibility that network protocols are implemented differently. For example, the TRILL protocol is a relatively new protocol for optimizing networking paths within a data center. As the protocol matures, there are opportunities for implementation differences by different vendors. Conformance testing, as it’s called, confirms that network devices adhere to published standards.

Pre-deployment testing: This generally takes place in a special development or test lab that recreates a subset of the live system. This category applies to component selection and characterization testing as well as to complete systems and subsystems. As in component testing, it is essential that real-world conditions be used. The principal technique used for this type of testing is called “emulation,” wherein interactions are simulated via software. For example, a web server that is used to host customer-facing account management should be tested by performing the same interactions that an end user’s browser would have with that server.

Emulation is very useful to test larger subsystems. For example, after measuring web server performance it would be logical to test the performance of multiple servers with their associated Advanced Data Centers (ADCs). Continuing the technique, after testing the security and switching subsystems an entire system could be performance-tested to determine the latency of interactions from the firewall at the front door through to the web server and its back-end components. Pre-deployment testing also proves interoperability on a larger scale.

Full system test: This type of testing uses the complete system when it is offline. While pre-deployment labs are an essential part of deploying and maintaining a large-scale system, they usually represent a small subset of the larger system that they support. There’s no substitute for the real thing. Where the opportunity exists, the full system should be used for end-to-end system test.

The same tests that were used in pre-deployment testing may be used for full system test, but it may not be possible to create enough emulated sessions to match the capacity of the entire system. It therefore may be necessary to use different techniques to fully stress the entire system, including using a customer’s system or recorded sessions and customizing test hardware. Regardless, full system test enables measurement of true end-to-end behavior. Queuing and other bottlenecks can be explored and tuned away.

Live system test: This type of testing happens while the system is live, which restricts the type of test operations that can be performed. Low-bandwidth injection and monitoring are two techniques that have proven useful in measuring live system performance. While it is important to minimize impact to a running system, short duration and bandwidth tests can be used in a number of ways. The latency of metropolitan and wide area networks can be measured with “ping” packets, or more-extensively tested with live system protocols using advanced tools. In some cases, special test accounts may enable full application testing. For example, in trading applications a fake broker could execute trades for a fake stock. Such low-bandwidth tests are often run on a recurring basis, varying from once every second to every several minutes. The values obtained can be plotted over time for a day or longer periods.

Live system monitoring involves the placement of high-speed network taps at key locations in the system. These taps are connected to intelligent concentrators or filters, and the data is then forwarded to sophisticated analysis tools. High-speed, high-capacity devices record all or filtered network traffic. Depending on the applications being monitored, some instantaneous measurements may be available. For example, some monitoring equipment used in trading applications can match up FIX transaction messages to provide instantaneous latency measurements. More often data post-analysis is required to provide performance metrics.

Measuring latency before it costs you money
Latency is an important performance indicator in most networked applications. There are many factors, however, that contribute to latency inside of and between data centers. There are a variety of tools available to discover the sources of latency so that those areas may be improved, such as those described above.

Mike Haugh is senior manager of market development at Ixia. He can be reached at [email protected].