NVIDIA RTX Spark - ARM Blackwell Superchip for AI PCs
NVIDIA RTX Spark is a 20-core ARM + Blackwell GPU superchip delivering 1 petaFLOP FP4 and 128GB unified memory for AI-first Windows laptops and desktops.

TL;DR
- 20-core ARM CPU (10x Cortex-X925 + 10x Cortex-A725) paired with a Blackwell RTX GPU carrying 6,144 CUDA cores
- 128GB LPDDR5x unified memory at 300 GB/s - runs 120-billion-parameter models locally with 1M token context
- 1 petaFLOP FP4 AI compute via 5th-generation Tensor Cores; 70 billion transistors on TSMC 3nm
- NVIDIA's play against Apple Silicon and Qualcomm Snapdragon for the high-end Windows AI PC market
Overview
NVIDIA announced the RTX Spark superchip at Computex 2026 on June 1, marking the company's formal entry into the personal computing SoC market. RTX Spark combines a 20-core NVIDIA Grace CPU with a Blackwell RTX GPU on a single die, connected via NVIDIA's NVLink-C2C chip-to-chip interconnect. The result is a 70-billion-transistor SoC built on TSMC's 3nm process that shares a unified 128GB pool of LPDDR5x memory across both processors.
This is NVIDIA positioning itself in a market Apple has controlled with the M-series. The M4 Max (see our coverage) offers 546 GB/s of memory bandwidth against RTX Spark's 300 GB/s - a real gap. But NVIDIA's counter is architectural: 5th-generation Tensor Cores with native FP4 support give the Blackwell GPU capabilities Apple's Neural Engine doesn't match at this form factor. Running a 120-billion-parameter model locally isn't something any current M4 device can do without quantization tricks that sacrifice quality.
The same underlying chip already ships inside the DGX Spark personal AI supercomputer at $4,699 - a product NVIDIA launched at GTC 2026. RTX Spark rebrands the GB10 Grace Blackwell chip for OEM laptop and desktop integration, with over 30 laptop models expected from ASUS, Dell, HP, Lenovo, Microsoft, and MSI arriving fall 2026. Microsoft's Surface Laptop Ultra is among the first announced.
NVIDIA isn't claiming RTX Spark beats Apple on every axis. It isn't. But for local AI inference with large models, the Tensor Core and FP4 advantage matters more than peak memory bandwidth.
Key Specifications
| Specification | Details |
|---|---|
| Manufacturer | NVIDIA |
| Product Family | RTX Spark (GB10 Grace Blackwell) |
| Chip Type | SoC |
| Process Node | TSMC 3nm |
| Transistor Count | 70 billion |
| CPU Architecture | NVIDIA Grace (ARM) |
| CPU Cores | 20 (10x Cortex-X925 + 10x Cortex-A725) |
| CPU Max Frequency | 4.1 GHz (performance cores) |
| GPU Architecture | NVIDIA Blackwell |
| CUDA Cores | 6,144 |
| Tensor Cores | 5th Generation (FP4 support) |
| RT Cores | 4th Generation |
| Memory | 128GB LPDDR5x |
| Memory Interface | 256-bit |
| Memory Bandwidth | 300 GB/s |
| AI Performance | 1,000 TOPS / 1 PFLOP (FP4 with sparsity) |
| CPU-GPU Interconnect | NVLink-C2C |
| TDP (SoC) | 140W |
| Release Date | June 2026 (DGX Spark) / Fall 2026 (OEM devices) |
Performance Benchmarks
NVIDIA hasn't published third-party benchmark data for RTX Spark laptops yet - OEM devices don't ship until fall 2026. The DGX Spark desktop (same chip) has built up some real-world numbers since its GTC 2026 launch.
| Metric | RTX Spark / DGX Spark | Apple M4 Max | Qualcomm X Elite |
|---|---|---|---|
| AI Compute (FP4) | 1,000 TOPS | ~40 TOPS (Neural Engine) | 45 TOPS (NPU) |
| Unified Memory | 128GB | 128GB (max config) | 64GB (max) |
| Memory Bandwidth | 300 GB/s | 546 GB/s | 135 GB/s |
| Max LLM Parameters (local) | 120B | ~40-50B (practical) | ~13B (practical) |
| TDP (SoC) | 140W | ~36W (M4 Max) | ~23W |
| Process Node | 3nm | 3nm | 4nm |
The memory bandwidth deficit versus Apple is real. For memory-bandwidth-bound inference - which describes most LLM serving - a M4 Max will outrun a RTX Spark on per-watt throughput for the same model size. The trade flips for model sizes above roughly 50 billion parameters, where RTX Spark can run natively in FP4 while Apple devices must resort to aggressive compression.
NVIDIA claims RTX Spark laptops run Adobe Premiere and Photoshop twice as fast as competing platforms after application-level optimization. These claims are unverified by independent testing at publication date and will be updated when review hardware ships.
Key Capabilities
NVLink-C2C Integration
The Grace CPU and Blackwell GPU share memory directly through NVLink-C2C rather than going through a discrete PCIe connection. This eliminates the copy overhead that penalizes GPU-offloaded tasks on conventional designs. For agentic AI workloads - where a language model, a vision system, and a code executor might all be running simultaneously - shared memory with zero-copy access is a meaningful advantage over PCIe-connected discrete GPU designs.
FP4 Tensor Core Support
5th-generation Tensor Cores handle native FP4 matrix multiplication with sparsity acceleration. At FP4, the Blackwell GPU can run models twice the size of FP8 in the same memory budget, or achieve roughly 2x throughput on the same model. Apple's Neural Engine doesn't natively support FP4 at this precision level. Qualcomm's NPU supports INT4 but not the OCP MX FP4 format NVIDIA uses, which has better dynamic range than integer quantization.
Local Reasoning with Long Context
NVIDIA's specification sheet claims 1 million token context length on-device at up to 120 billion parameters. In practice, context length and model size interact through the KV cache - a 1M context at 120B parameters would demand massive KV cache memory - but the raw memory pool of 128GB makes it feasible where 64GB devices cannot go.
Gaming and Content Creation
RTX Spark isn't only an AI chip. 6,144 CUDA cores with DLSS 4.5 Multi Frame Generation targets 100 FPS at 1440p in games. The Blackwell media engine handles 12K 4:2:2 video editing and 4K AI video generation locally. For creators who also do AI work, this replaces the awkward two-device workflow of a MacBook and a separate AI workstation.
Pricing and Availability
Microsoft Surface Laptop Ultra - one of the first OEM devices built around the NVIDIA RTX Spark superchip, announced at Computex 2026.
Source: windowscentral.com
The DGX Spark desktop - the first commercial product using the GB10 chip - launched at $4,699 for the 4TB storage configuration. That price covers the full desktop system including power supply and NVMe storage, not just the chip.
OEM laptop pricing hasn't been announced. Reference points from competing premium platforms suggest RTX Spark laptops will land in the $2,500-$5,000 range based on the MacBook Pro M4 Max starting at $2,499. Microsoft's Surface Laptop Ultra, Dell's XPS 16, HP OmniBook X 14, and Lenovo Yoga Pro 9n are among the first wave.
For cloud access, NVIDIA's own DGX Cloud platform offers RTX Spark compute for developers who want to test the platform before hardware ships.
Strengths
- 1 petaFLOP FP4 in a laptop form factor - nothing else at this TDP comes close
- 128GB unified memory runs models no other laptop chip can fit
- Full NVIDIA software stack: CUDA, TensorRT, NIM microservices, DLSS, RTX
- Gaming and content creation performance alongside AI workloads in one chip
- NVLink-C2C removes PCIe bottleneck between CPU and GPU
Weaknesses
- 300 GB/s memory bandwidth trails Apple M4 Max's 546 GB/s significantly
- 140W TDP limits battery life and thin-laptop form factors
- OEM software optimization still maturing - Windows ARM ecosystem remains incomplete compared to macOS
- No confirmed pricing for OEM devices yet
- Fall 2026 availability for laptop/desktop products; only DGX Spark desktop ships now
Related Coverage
- NVIDIA RTX 5090 - Blackwell Desktop GPU - the discrete GPU option for builders who don't want a SoC
- Apple M5 Max - the primary competitor in the high-end laptop AI compute space
- NVIDIA GB300 NVL72 - Blackwell Ultra Rack - where the same Blackwell architecture goes at datacenter scale
Sources
✓ Last verified June 1, 2026
