Insight Networking

Scale-Up Ethernet in the AI Era

AI clusters now need a dedicated scale-up fabric for tightly coupled accelerators. Here is why it differs from classic data-center networking, and how open efforts are reinventing Ethernet to deliver memory-semantic, sub-microsecond connectivity.

Background & motivation

Large AI models are split across many accelerators (XPUs/GPUs), and that splitting creates two very different network domains. Scale-out connects servers into clusters across the data center, the job classic Ethernet and InfiniBand have always done. Scale-up connects a small set of accelerators inside a rack or pod so tightly that they can read and write each other's memory directly, effectively behaving like one large "super GPU."

Tensor and expert parallelism live in that scale-up domain, where a single collective operation can stall thousands of compute units if the interconnect is slow. Historically this tier has been proprietary, with NVIDIA's NVLink the most common interface. The industry is now standardizing open alternatives that pair memory semantics with the economics of high-volume Ethernet.

Architecture: what a scale-up fabric must do

A scale-up network has three non-negotiable requirements: very low latency, very high bandwidth, and native memory semantics — one-sided put, get, and atomic operations rather than message passing. Two open efforts target this directly:

  • UALink uses a PCIe-like memory-semantic protocol with fixed-size frames, combining the low latency and load/store model of PCIe with Ethernet-class data rates.
  • Scale-Up Ethernet (SUE) reuses the Ethernet ecosystem for the scale-up tier. Broadcom contributed the SUE 1.0 specification to the Open Compute Project in July 2025; it targets sub-2µs latency, up to 9.6 Tbps of XPU-to-XPU bandwidth per pair, and scaling across up to 1,024 XPUs over 200/400/800G ports. At OCP Global Summit 2025 this became ESUN (Ethernet for Scale-Up Networking), backed by AMD, Arm, Arista, Cisco, HPE, Marvell, Meta, Microsoft, NVIDIA, OpenAI, and Oracle.

Three mechanisms make this work, and they are exactly where SoC and NIC designers spend their effort:

  • Memory semantics — shared-memory load/store and atomics across accelerators, not just packet delivery.
  • Reliable transport — selective retransmit and lossless flow control (for example, separate PFC classes for requests and acknowledgements so a backlog of requests cannot deadlock acks).
  • Congestion management — moving beyond reactive ECN/drop signals toward in-band telemetry and multi-path packet spraying, as the parallel Ultra Ethernet (UEC) transport does for scale-out.

How Ivy Microsystems helps

Building an XPU, a scale-up switch, or a NIC for these emerging fabrics means co-designing the SoC architecture, the on-chip interconnect, and the transport firmware — often before the standards have fully settled. Our tools and IPs are built for exactly that kind of moving target:

  • SoC Pilot — explore scale-up SoC and switch topologies on a canvas, integrate interconnect IP, and generate RTL with design-rule checking as the architecture evolves.
  • Silicon IP — RISC-V cores, Networks-on-Chip, and DMA controllers provide the building blocks for memory-semantic data movement and high-throughput on-chip transport.
  • Firmware Pilot — validate transport, reliability, and congestion-control firmware against a virtual SoC model, so protocol decisions are de-risked long before silicon.

Further reading

Designing for scale-up?

Talk to us about building accelerators, switches, and NICs for next-generation AI fabrics.