XCENA Raises $135M Betting Memory Is AI's Real Bottleneck

Every AI infrastructure conversation in 2026 eventually circles back to GPUs - whether NVIDIA can make enough of them, who gets allocation, and how long training runs take. XCENA thinks that framing misses the actual choke point.

The South Korean startup closed a $135 million Series B at a $570 million valuation on May 29, 2026, led by Seoul-based VCs Atinum and IMM Investment with participation from Corstone Asia. Existing investors SBI Investment and Mirae Asset Capital also joined. Total funding now sits at $185 million. The company's pitch is direct: inference isn't a compute problem, it's a memory scaling problem, and the chip industry has failed to address that for decades.

TL;DR

$135M Series B at $570M valuation; $185M total raised since 2022
MX1 chip: CXL 3.2 device with thousands of RISC-V cores embedded alongside DDR5-8400 memory
Processes KV cache, vector search, and data orchestration directly at the memory tier - no CPU/GPU round trips
MX1P targets mass production late 2026; revenue expected in 2027
Company claims one MX1-equipped server can replace workloads currently requiring ten machines

The Memory Wall Inference Has Hit

Running a large language model at scale means managing a KV cache that grows linearly with sequence length and concurrent user count. A 70B model serving 1,000 simultaneous requests at 4K context needs tens of gigabytes of attention cache per layer stack - and that data has to move between storage, CPU, and GPU on every single forward pass.

Why KV Cache Breaks Everything

GPU VRAM is fast but limited. A pair of H100s gives you 160GB of HBM3e at around 3.35 TB/s bandwidth. That sounds like plenty until you're doing continuous batching across hundreds of long-context requests. KV cache eviction starts hurting latency long before raw compute hits its ceiling.

XCENA CEO Jin Kim frames the issue bluntly: "CPUs and GPUs have both gotten smarter over the decades. Memory never did." The memory controller logic sitting between DRAM and the rest of the system has stayed static since the DDR3 era while everything else in the compute stack has been redesigned.

Every Inference Token Is a Data Relay Race

For each decode step in autoregressive generation, the system fetches attention keys and values from memory, runs matrix multiplications on the GPU, then writes results back. The math is cheap. The data movement is not. On modern inference servers, memory bandwidth saturation - not FLOP use - determines throughput per dollar.

This isn't a new observation. Groq built its entire LPU architecture around deterministic memory access, and the broader memory chip squeeze shows how far DRAM demand has outpaced supply. XCENA's bet is that you don't need to redesign the whole compute stack. You just need to put compute inside the memory device itself.

XCENA MX1P computational memory card at FMS 2025 The XCENA MX1P at Future of Memory and Storage 2025, where it won "Most Innovative Memory Technology." The card attaches via CXL 3.2 over PCIe Gen6. Source: servethehome.com

Inside the MX1

The MX1 is a CXL 3.2 device that attaches to the host CPU over a PCIe 6.0 link. From the host's perspective it looks like a remote memory region on a separate NUMA node. Internally it runs thousands of custom RISC-V cores at 1.4 GHz with Cortex-A53 management processors, vector engines with FP32/FP16 support, and up to 1TB of DDR5-8400 DRAM in a quad-channel configuration.

CXL 3.2 as the On-Ramp

CXL (Compute Express Link) is an open interconnect standard built on PCIe that adds cache-coherent memory semantics. CXL 3.2 supports Hardware-coherent Directionless Memory with Back-Invalidation (HDM-DB with BI), which lets the host OS treat the MX1's DRAM as a standard NUMA node with no custom driver required.

On a Linux host running kernel 6.2 or later, the MX1 shows up as an attached memory expander and can be managed with standard tooling:

# List CXL devices and memory regions
cxl list -M

# Example output (abbreviated):
# memdev: mem0  |  numa_node: 2  |  ram_size: 274877906944

# Pin KV cache allocations to the MX1 NUMA node
numactl --membind=2 python3 serve.py --model llama4-70b --kv-cache-only

The promise is that KV cache entries stay on the MX1 completely. The onboard RISC-V cores handle prefetch scheduling, eviction policy, and vector search for retrieval-augmented generation without any data ever needing to cross the PCIe link back to the host.

Near-Data Processing With RISC-V Cores

XCENA hasn't published exact core counts or independent benchmark results - the MX1 is still pre-production. What the company describes is a set of workloads the RISC-V cores offload from the host CPU: KV cache orchestration, vector database queries, prefetch decisions, compression, and access pattern tracking. The FP32/FP16 vector engines handle operations that need floating-point precision without the round-trip overhead of moving tensor data across PCIe.

This is processing-in-memory (PIM), a category Samsung and SK Hynix have both prototyped - their AXDIMM and AiM chips respectively - but neither has reached mass-market production for inference workloads. XCENA's founders came from both companies. Their read is that CXL 3.2's cache-coherent semantics finally solve the integration problem that stalled those earlier attempts. The growing RISC-V ecosystem for AI workloads - demonstrated most recently by Alibaba's XuanTie C950 being the first RISC-V CPU to run LLM inference natively - suggests the tooling is maturing in time.

InfiniteMemory - SSD-Backed Capacity

The MX1 also connects to NVMe SSDs over PCIe 6.0, creating what XCENA calls InfiniteMemory: a tiered address space where hot data lives in DDR5 and cold data spills to flash. The RISC-V management cores handle tier migration autonomously. The host sees a single flat address space that can extend to petabyte scale, with access latency varying by tier.

XCENA InfiniteMemory tiered memory architecture XCENA's InfiniteMemory design layers DDR5 DRAM over NVMe SSD, with onboard RISC-V controllers managing data placement between tiers autonomously. Source: servethehome.com

Variant Comparison

XCENA ships MX1 in two configurations targeting different slot requirements:

Spec	MX1P	MX1S
PCIe interface	Single Gen6 x16	Dual Gen6 x8
CXL version	3.0 / 3.2	3.2
Memory	DDR5-8400, quad-channel	DDR5-8400, quad-channel
Max capacity	1TB (256GB DIMMs)	1TB (256GB DIMMs)
Required host OS	Linux 6.2+ with CXL	Linux 6.2+ with CXL
CPU requirement	PCIe Gen6 + CXL 3.x	PCIe Gen6 + CXL 3.x
Production target	Late 2026	2026
Revenue target	2027	2027

Both variants require a host platform with PCIe Gen6 support and CXL 3.x capability. Intel Xeon 6 and AMD EPYC Genoa both qualify. Older Xeon Scalable generations don't support CXL 3.x and won't work.

Who Is XCENA and What the $135M Funds

XCENA was founded in 2022 under the name MetisX by three veterans of South Korea's memory industry. CEO Jin Kim and CPO Harry Juhyun Kim came from SK Hynix; CTO Kim DoHun came from Samsung. The founding thesis was that decades of memory engineering had focused completely on density and speed while leaving computation at the memory tier untouched.

The funding history: roughly $6 million in seed in 2022, a $44 million Series An in 2024, and now the $135 million Series B. The new capital will go toward final production engineering on the MX1P, the MX1S tape-out, engineering headcount growth in Sunnyvale, and early customer deployments at hyperscale inference providers. The company has over 90 employees split between Pangyo, South Korea, and Sunnyvale, California.

XCENA won "Most Innovative Memory Technology" at FMS 2025, where it first showed working MX1P samples to select partners. Working samples started shipping to partners in late 2025.

This funding fits a pattern that's been running since late 2024, where AI chip startups have been raising at the billion-dollar scale across memory, interconnect, and inference acceleration - each targeting a different layer of the same infrastructure bottleneck.

Where It Falls Short

XCENA hasn't published independent benchmark data for the MX1. The claim that one MX1-equipped server can replace ten standard machines comes from the company's own analysis and hasn't been verified externally. With the MX1P still moving from partner samples to production, those numbers should be treated as design targets rather than measured results.

CXL 3.2 adoption in production infrastructure is truly early. The hyperscalers most likely to buy XCENA's hardware at scale are still rolling out first-generation CXL 1.1 deployments. XCENA needs CXL 3.2 to reach broad hardware availability at roughly the same time its chips are ready to ship in volume - a coordination that isn't guaranteed and that neither XCENA nor its customers controls.

The software story also needs work. XCENA ships a LLVM-based toolchain and a SDK, but getting inference workloads to efficiently offload KV cache management to the device requires application-level integration that most ML infrastructure teams haven't had to write before. The Linux NUMA path lowers the bar significantly, but optimized usage still demands real engineering time from customers.

Revenue isn't expected until 2027, which means the $185M in total funding needs to cover at least 18 more months of operations before the company produces returns. That's achievable but leaves thin margin for production delays or slower-than-expected CXL rollout at hyperscalers.

The core thesis - that memory bandwidth is the binding constraint on inference throughput per dollar - is sound, and the team's credibility in memory engineering is real. Whether the CXL 3.2 ecosystem timeline aligns with XCENA's production schedule is what investors are actually betting on with this round.

Sources: