The Birth of Cognitive Networks for Data Center Interconnect

Customers today expect better performance on a wide range of applications like streaming television, enhanced 4K video, and interactive gaming, all of which are increasing traffic at a steady 26% CAGR (Cisco VNI). This traffic is running over data center interconnect networks that have grown into a more than $1 billion market according to Ovum. To manage this growth, service providers as well as content providers must increase their operational efficiencies to gain more from their existing infrastructure.

Some of this has been solved at the optical layer, where the network is more reconfigurable and adjustable than ever before. By leveraging flexible symbol rates, flexible modulation, and flexible grid options, significant improvements have been made to balance capacity versus distance. Large amounts of data are collected now with knobs that help enable a more dynamic optical layer. The question is, how will gathering and analyzing all this data help drive down the total cost of the network?

To enable such cost reduction requires a shift in focus to the software advances in fields such as analytics and machine learning. The increasing use of software to control, analyze, and manage the network has led to the application of what is today called Cognitive Networks (CN).

Cognitive networking is an application of artificial intelligence (AI), where the platform collects, learns, plans, and then acts on the network. To do this, we leverage machine learning, which is a subset of AI — machines learn the behavior of the network by analyzing the data using mathematical and statistical tools. An machine learning tool collects the data and delivers it to a rule and/or decision-making software block that ultimately sends the proper corrective action(s) to adjust the network. Previously, offline tools and processing did a minute part of this process, but today with software-defined networks, machines can learn the network and associate rules, policies, and actions to correct performance issues or outages/events, and in fact, even anticipate a network event.

Ready or not?

But are we really ready for AI and machines to respond and reconfigure the network or to make these types of changes in real time? It is unlikely that a service or content provider today would allow these actions without some human interaction or intervention. However, it is a great target to work towards, so let’s look at some realistic approaches that can be done today:

Today we use static offline design tools. If we had dynamic learning tools that could feed activity back to the operator, we could predict the proper amplifier settings or the modulation technique to minimize network margins and provide optimized data rates across the data center interconnect metro, regional, or long-haul network. Since machine learning is continually updating and adapting, the margins in the network can be pushed tighter, thus lowering the overall network cost.
Network failures and restoration become more predictable with cognitive networks. The machine learning algorithms may suggest shifting traffic if the link is degrading, but shifting the traffic when utilization is the lowest, so that impact to the network is minimized.

These are valuable use cases that improve operational efficiencies. Today, there is a growing amount of network sensor data available. The issues are how to capture the data, as well as how accurate and relevant is the data that’s been collected, and how do we ensure proper correlation of the correct parameters? The market still needs common collection methods, and it has to be across multi-vendor environments. For the machine to learn, the information collected and shared must be in a similar format. This means we need to use common data models that would make the network vendor-agnostic. There is a lot of standardization progress taking place in forums, such as OpenROADM, OpenConfig, Telecom Infra Project (TIP), etc. We see service and content providers pushing harder and in many cases driving common model creation.

What are the next steps? Identifying what anomalies should be captured, what data is relevant, and what data needs to be discarded is a big step. Then more work needs to be done on the algorithms. It’s unlikely a single algorithm will solve these issues; it will likely be a combination of algorithms working together to deliver a solution. Leveraging cognitive networks, we can solve real customer problems and bring real value by lowering the total cost of ownership, delivering services faster and more reliably—all on an optimized network.

Walid Wakim is a Distinguished Engineer at Cisco.