Key Highlights
- Distributed AI workloads require scale-across networks to overcome physical and power limitations of single data centers.
- Power efficiency and reliability are critical, with innovations like liquid cooling and multi-rail amplifiers helping to address these challenges.
- Coherent pluggables, especially 800G ZR/ZR+, are key to scaling transport bandwidth while maintaining power and operational efficiency.
- Emerging technologies such as L-band deployment alongside C-band are expanding wavelength capacity to support growing AI traffic demands.
- Simplifying network operations through disaggregation and interoperability is vital for future scalability and risk mitigation.
The benefits of distributed AI workloads have led network operators to build larger data centers to accelerate time-to-market. Due to physical and power constraints, existing scale-up and scale-out AI architectures confined to a single data center are no longer sufficient to meet the processing demands of AI workloads.
Scale-across networks connecting geographically diverse AI clusters are becoming critical because they address the growing need to distribute AI workloads across multiple data centers that operate as a single entity. This new architecture is crucial to overcoming physical limitations, such as power availability and space constraints, that individual data center sites face today.
In this article, we’ll discuss this architectural shift and highlight strategies for building a network that can scale transport bandwidth to meet future AI demands.
AI is the new cloud
The AI buildout is similar in some ways to what the industry went through 10 years ago in cloud networking, but at a larger scale. As before, new requirements are driving changes to the underlying architecture. Generally, in the cloud build-out, cost and power were the key drivers as data center size scaled dramatically. For this AI build-out, power and reliability are top priorities.
The reason is that GPU utilization is critical to delivering AI performance, and reliability will be significantly impacted if training cycles are interrupted and must be restarted. Power remains important because networking power can be thought of as reducing the power available for compute. While latency is a factor to consider in these new architectures, the latest algorithmic innovations enable higher-latency management further up the stack.
One difference with AI is the sheer scale of its growth. LightCounting estimates that data center capacity has increased 7X over 7 years due to AI buildouts. Looking at DCI connections between data centers, the same type of growth is expected as the industry seeks to connect more clusters.
In fact, LightCounting forecasted 4.5X growth over the next 7 years for connectivity between data centers.
Scalability, power, and reliability are top concerns
Power is a problem across the board, and while technologies such as liquid cooling are emerging to help alleviate this, these solutions introduce new operational challenges and add to already rising infrastructure costs. GPU investments are soaring as the industry tries to build even larger clusters and the amount of power needed to operate these clusters will increase dramatically compared to today’s traditional one-site data centers. Today’s largest data centers typically employ hundreds of thousands of GPUs, but plans are in place for data centers with a million GPUs and more. Going beyond these levels will be difficult without some degree of geographic diversity, which is why addressing the challenges of scale across networking is so important.
Scale-across use cases are estimated to require 8X the bandwidth between sites compared to cloud DCI. Instead of deploying bandwidth incrementally at the wavelength level, some operators are considering multi-fiber bandwidth granularity. This will require new approaches to addressing power and reliability issues to optimize the network better. New strategies under consideration include multi-rail amplifiers that enable vendors to leverage common elements across multiple lines, and media converters that integrate the client and line into a single DSP.
All of this scale is leading to several new challenges that need to be addressed, including:
· Operation scalability: Disaggregation is going to be vital because it allows vendors to choose different solutions and limit risk to the supply chain and component failures. The simpler we make networks, the easier it will be to scale in the future. This includes prioritizing interoperability and using fewer components.
· Power Efficiency: If less power is used for networking, that power can be applied to GPUs, delivering the compute performance. In addition, power availability will be a challenge, particularly at regeneration sites. Finding the right trade-off between power and performance will be critical.
· Reliability: AI training is very sensitive to any sort of disruption and traffic flow. This problem is magnified at scale because issues may arise at remote regenerator sites, where operational support is even more challenging.
Coherent can scale across the network
Fortunately, the tools the industry needs to support scale-across networking have been used for quite some time – and that includes both pluggable coherent and transponder solutions, depending on the customer’s needs and their network configuration.
For pluggables, the same benefits and efficiency that these modules delivered to traditional DCI will become even more critical in these new power-constrained AI environments. As the use cases for coherent pluggables have expanded with each new generation, this efficiency has continued to spread across the network.
As we look towards the future, 800G ZR/ZR+ is well positioned to be the powerhorse across these scales, followed by 1.6T. For this reason, CignalAI has forecast strong growth for 800G pluggables, with 100k units/year starting in 2026. However, it’s important to note that while growth is expected in the coherent pluggables segment, many customers prefer separate transponder solutions that still prioritize power, reliability, and performance.
LightCounting forecasts that both segments of the market are expected to grow due to the sheer volume of AI traffic demand.
L Band's rise in next-gen architectures
In the past, advances in coherent technology increased efficiency by doubling the baud rate while maintaining the same modulation format. However, we are now at the point where doubling from 400ZR+ to 800G ZR+ increases wavelength capacity but not C-band capacity. As a result, we are now seeing L-band being deployed alongside C-band from day 1 – and, in some cases, at comparable volumes.
The industry will rise to the challenge…Once again
Scale-across demand in AI will drive economies of scale for specific transport solutions. In the near term, that includes technologies such as 800G ZR+ and performance-optimized modules such as Acacia’s CIM 8. We anticipate other network applications will also leverage these economies of scale, as we observed with 400G pluggable adoption in service provider networks, which mirrors the scale of cloud deployments.
Looking ahead to subsequent generations, the industry will need to continue to align the best ways to address power and reliability challenges. The only certainty is that this has been done before, and we believe it can be done again.
About the Author

Tom Williams
Tom Williams is Vice President of Marketing at Acacia. Prior to Acacia, Tom spent 14 years at Optium and Finisar Corporations, where he held various management roles, including Director of Product Line Management for high-speed (>100G) transmission products. Tom has also held positions at Lucent Technologies and Northrop Grumman Corporation. He has an MS in Electrical Engineering from Johns Hopkins University and a BS degree in Electrical Engineering and Physics from Widener University.



