Use Case Computing Memory

Smart CXL Memory Expander

AI inference is increasingly limited by memory, not compute. CXL Type 3 expanders add capacity and bandwidth beyond the CPU's DRAM channels — and a "smart" expander can do more than just hold data.

Background & motivation: the memory wall

As models grow and context windows lengthen, memory capacity and bandwidth — not raw FLOPS — increasingly decide what can run in production. LLM inference is especially hungry: the key-value (KV) cache that holds attention state routinely runs to tens or hundreds of gigabytes, and spilling it to slower storage cripples throughput. Traditional CPU DDR channels and fixed GPU HBM simply cannot scale capacity independently of compute.

Compute Express Link (CXL) attaches memory over the PCIe physical layer with cache-coherent load/store semantics. It lets you add capacity and bandwidth without consuming the host's primary DRAM channels, opening a new, flexible tier in the memory hierarchy.

Architecture: CXL Type 3 and tiering

A memory expander is a CXL Type 3 device: it connects DDR5 (or other media) to the host through a PCIe slot or EDSFF module and presents it as additional, byte-addressable system memory. The spec has advanced quickly:

  • CXL 2.0 introduced memory pooling, letting multiple hosts draw from a shared pool with distinct allocations.
  • CXL 3.0 (built on PCIe 6.0 at 64 GT/s) added true shared memory and multi-level switching for fan-out and cascade topologies.
  • CXL 4.0 (2025) doubles the rate again to 128 GT/s.

A CXL controller adds on the order of ~70 ns over direct-attached DRAM — still 20–50× faster than NVMe storage. That makes CXL a distinct tier between HBM/DDR and disk, with software migrating hot and cold data accordingly. For AI specifically, offloading the KV cache to CXL has shown large throughput and energy-per-token gains versus SSD- or RDMA-based caching, and near-data processing on the expander can extend usable context length further.

A smart expander is therefore more than a media controller: it can host tiering and migration logic, telemetry, and even near-data processing — which is what turns a memory board into a system-level design problem.

How Ivy Microsystems helps

How fast can you build your next CXL system? Designing a smart expander means building a real SoC — controller, media management, fabric, and firmware — and proving it against workloads before committing silicon. Our tools and IPs let you do that early, at full-system scale, and without real silicon:

  • SoC Pilot — architect the CXL controller SoC on a canvas, integrate IP, and generate RTL with design-rule checking.
  • Silicon IP — Networks-on-Chip for the internal fabric, DMA controllers for data movement and migration, and RISC-V cores for management and tiering firmware.
  • Firmware Pilot — validate controller firmware, memory-tiering and migration policy, and host interaction against a full-system SoC model long before tape-out.

Further reading

Designing a CXL device?

Talk to us about building and validating a memory expander before silicon.