TL;DR

Same 768GB LPDDR5X memory as the AI200 but with near-memory computing that Qualcomm claims delivers 10x higher effective bandwidth
Notably lower power consumption than AI200 despite the same 160 kW rack envelope
Hexagon NPU architecture, confidential computing built-in, PCIe scale-up and Ethernet scale-out
Humain (Saudi Arabia, 200MW deployment) is confirmed as an early customer; availability 2027

Overview

The Qualcomm AI250 was announced in October 2025 with the AI200, but the two chips aren't simultaneous releases. The AI200 targets 2026 commercial availability; the AI250 follows in 2027. Both carry the same 768GB of LPDDR5X memory per accelerator card - a figure that dwarfs the 192GB HBM3E on NVIDIA's B200 - but they use fundamentally different memory architectures to serve that capacity.

The AI250's headline innovation is near-memory computing. Instead of fetching data from LPDDR5X into the compute core to execute operations, the AI250 moves compute logic close to the memory arrays themselves. The effect, according to Qualcomm, is an effective memory bandwidth increase of more than 10x compared to the AI200, combined with meaningfully lower power consumption. If that claim holds up under real workloads, it's architecturally significant.

LLM inference bottlenecks on memory bandwidth, not raw TFLOPS. A model with 70 billion parameters at FP16 weighs 140GB; each created token requires reading basically all of those weights from memory once. The bandwidth available to the chip sets the floor on generation speed, regardless of how many multiply-accumulate units the chip has. Qualcomm's bet is that changing where the computation happens - at the memory rather than the compute core - can break this bottleneck more efficiently than adding more HBM bandwidth.

Key Specifications

Specification	Details
Manufacturer	Qualcomm
Product Family	Cloud AI
Chip Type	ASIC (Hexagon NPU)
Process Node	Not disclosed
Memory	768 GB LPDDR5X per card
Effective Memory Bandwidth	10x AI200 (absolute figure not disclosed)
FP8 Performance	Not disclosed
Architecture Innovation	Near-memory computing
Security	Confidential computing built-in
Scale-up Interconnect	PCIe
Scale-out Interconnect	Ethernet
Cooling	Direct liquid cooling (DLC)
Rack Power	160 kW
Target Workload	Inference
Availability	2027

Performance Benchmarks

Qualcomm hasn't published TOPS or TFLOPS for the AI250. The company doesn't release those numbers for the AI200 either, which makes direct quantitative comparison with NVIDIA or AMD difficult.

Metric	Qualcomm AI250	Qualcomm AI200	NVIDIA B200	AMD MI300X
Memory Capacity	768 GB LPDDR5X	768 GB LPDDR5X	192 GB HBM3E	192 GB HBM3
Effective Memory BW	10x AI200	Baseline	8,000 GB/s	5,300 GB/s
FP8 TFLOPS	Not disclosed	Not disclosed	9,000	5,300
Rack Power	160 kW	160 kW	~700W/GPU	~750W/GPU
Cooling	DLC	DLC	Air or liquid	Liquid
Availability	2027	2026	Available	Available

The memory capacity comparison is the most concrete: 768GB per card versus 192GB on B200 or MI300X. A single Qualcomm card can hold a 400B+ parameter model at FP16 without requiring model parallelism across multiple cards. An equivalent NVIDIA configuration needs roughly four B200s to hold the same model. For inference serving, fewer cards per model means simpler orchestration and lower power per model instance.

What's still missing: absolute bandwidth numbers. "10x AI200" is meaningful only if we know what AI200's actual bandwidth is - which Qualcomm hasn't published. For buyers assessing the AI250, this is a significant blind spot that won't resolve until third-party benchmarks appear after launch.

Key Capabilities

Near-Memory Computing Architecture. Near-memory computing puts logic circuits inside or with the DRAM die, rather than in a separate processor die connected via a memory bus. For read-heavy workloads like LLM inference, this compresses the data path from several hundred nanoseconds (memory controller round-trip) to much lower latencies. The bandwidth that matters for inference isn't the raw throughput of the memory interface, but how quickly the compute core can consume weights during token generation. Near-memory compute changes that equation by doing work inside the memory subsystem.

This architectural approach isn't new in HPC - Processing-in-Memory (PIM) and near-memory compute have been research topics for decades. Qualcomm's claim is that it has made the approach commercially viable at the scale of a data center inference card. SK Hynix has demonstrated similar technology with its AiMX chip, and Samsung has explored it for HBM. If Qualcomm's implementation is production-quality, it could set a new efficiency benchmark for bandwidth-intensive workloads.

768GB LPDDR5X at Scale. LPDDR5X is a less conventional choice for data center AI than HBM. It's lower bandwidth per pin than HBM, which is why Qualcomm's near-memory compute claim is load-bearing - the architecture has to compensate for the raw bandwidth gap against HBM3E. The upside of LPDDR is cost: LPDDR5X modules are substantially cheaper per gigabyte than HBM3E stacks, which is central to Qualcomm's total cost of ownership argument. A 768GB AI250 card doesn't need the exotic HBM packaging that drives up cost on NVIDIA and AMD products.

Confidential Computing. Both AI200 and AI250 include hardware-level confidential computing features - encryption and isolation for inference workloads. This is increasingly required for enterprise deployments handling sensitive data, and it's built into the Hexagon NPU architecture rather than added as a software layer. For regulated industries (finance, healthcare, government), this matters.

Pricing and Availability

Qualcomm hasn't published pricing. The company's reference point is the 200-megawatt AI infrastructure deployment with Humain in Saudi Arabia - both AI200 and AI250 are confirmed in that deal. The scale of that deployment implies production pricing exists, but hasn't been made public.

Commercial availability is 2027. That's a meaningful delay versus AI200 (2026), NVIDIA Vera Rubin (2026-H2), and AMD MI455X (2026-H2). Buyers assessing inference infrastructure choices for 2026 don't have the AI250 as an option. The AI200 ships in 2026, with AI250 following as a higher-efficiency successor for customers willing to wait.

The 160 kW rack power envelope is identical between AI200 and AI250 despite the claimed power reduction. Qualcomm appears to be using the same rack density but doing more compute or serving more throughput per kilowatt, rather than reducing the total rack footprint.

Strengths and Weaknesses

Strengths

768 GB LPDDR5X per card - 4x the memory capacity of NVIDIA B200 or AMD MI300X
Near-memory computing architecture targets the core bottleneck in LLM inference (bandwidth, not compute)
Claimed 10x effective memory bandwidth improvement over AI200
Lower power consumption than AI200 within the same 160 kW rack envelope
Confidential computing built into the Hexagon NPU architecture
Humain partnership validates commercial traction at data center scale
LPDDR5X cost advantage vs HBM3E could reduce per-card pricing versus HBM alternatives

Weaknesses

Availability is 2027 - can't compete for 2026 infrastructure decisions
FP8 TOPS not disclosed - impossible to benchmark against published GPU numbers
Process node not disclosed - limits architectural analysis
Absolute memory bandwidth not disclosed ("10x AI200" is relative to an undisclosed baseline)
No track record in production data center inference - AI200 hasn't shipped widely yet
LPDDR5X raw bandwidth lower than HBM; near-memory compute must close the gap to compete

Qualcomm AI200 - The current-generation counterpart shipping in 2026
NVIDIA B200 - The primary HBM3E-based competitor for inference
AMD MI300X - AMD's widely deployed 192GB inference accelerator

Qualcomm AI250 - Near-Memory Computing for Inference

Overview

Key Specifications

Performance Benchmarks

Key Capabilities

Pricing and Availability

Strengths and Weaknesses

Strengths

Weaknesses

Sources

Overview

Key Specifications

Performance Benchmarks

Key Capabilities

Pricing and Availability

Strengths and Weaknesses

Strengths

Weaknesses

Related Coverage

Sources