Nvidia Enters the PC Market With RTX Spark Superchip

Nvidia's RTX Spark packs 20 Arm CPU cores and a Blackwell 2.0 GPU with 6,144 CUDA cores into a 45-80W Windows laptop chip, targeting Apple Silicon head-on.

Nvidia Enters the PC Market With RTX Spark Superchip

Nvidia has spent thirty years making the GPU that powers every AI cluster on the planet. On June 1, Jensen Huang announced the company is also coming for your laptop.

The RTX Spark - internally the N1X - is Nvidia's first System-on-Chip for Windows PCs. It combines a 20-core Arm CPU with a Blackwell 2.0 GPU on a single TSMC 3nm package, runs the full CUDA software stack, and ships this fall in laptops from Microsoft, Dell, HP, ASUS, Lenovo, and MSI. Over 30 laptop and 10 desktop designs are already confirmed. The thinnest will be 14 millimeters.

For Nvidia, the PC market is a new front. For Qualcomm, whose Windows-on-Arm exclusivity deal with Microsoft expired earlier this year, the timing is poor.

Key Specs - Nvidia RTX Spark (N1X)

ComponentSpec
CPU20 Arm cores (10x Cortex-X925 + 10x Cortex-A725)
GPUBlackwell 2.0, 48 SMs, 6,144 CUDA cores
MemoryUp to 128 GB LPDDR5X, 16-channel
Bandwidth300 GB/s via NVLink C2C
AI Compute1 petaflop (claimed)
TDP45-80 W (full package)
ProcessTSMC 3nm
LaunchFall 2026

The CPU Half

20 Arm Cores, Two Classes

The N1X CPU die uses a classic big.LITTLE split: ten Cortex-X925 performance cores for single-threaded heavy lifting and ten Cortex-A725 efficiency cores for background tasks and power-sensitive workloads. The CPU and GPU dies connect via Nvidia's NVLink C2C interface at up to 300 GB/s - the same interconnect it uses in HGX AI server racks, now shrunk to fit in a laptop chassis.

The standard N1 (non-X) variant ships with 12 cores and 64 GB of LPDDR5X across a 8-channel bus. It targets the mid-range of the laptop market, where Qualcomm's Snapdragon X sits today.

TSMC 3nm - and What That Means for Thermals

Manufacturing on TSMC's 3nm node gives the N1X room to push GPU performance into the 45-80 W envelope without overheating thin chassis. The entire TDP figure covers both dies. That's the same power budget as a high-end Snapdragon X Elite, but with substantially more raw GPU compute underneath it.

Jensen Huang on stage at GTC Taipei 2026, announcing the RTX Spark superchip Jensen Huang at Taipei Music Center on June 1, announcing Nvidia's entry into the Windows PC chip market. Source: servethehome.com

The GPU Half

6,144 CUDA Cores - Same Count as RTX 5070

The Blackwell 2.0 GPU die packs 48 Streaming Multiprocessors, which works out to 6,144 CUDA cores - the same core count as Nvidia's desktop RTX 5070. Nvidia claims 1 petaflop of AI compute from the combination, which matches what you'd get from a standalone discrete GPU two product generations ago.

That GPU runs the full Blackwell feature set: DLSS 4.5 with Multi Frame Generation, ray tracing, and Reflex. For AI inference, it means the same CUDA libraries that run on H100s and B200s run on RTX Spark without recompilation.

Running Local Models on RTX Spark

The 128 GB unified memory pool is the spec that matters most for developers. A 4-bit quantized 70B model loads in seconds - the entire weight set fits with room for context. There's no discrete VRAM cliff to fall off.

# RTX Spark (N1X): 128 GB unified, full CUDA stack on Arm

# Install Ollama (Windows on Arm build ships at launch)
winget install Ollama.Ollama

# 14B models use under 10 GB - trivial on 128 GB
ollama run qwen3:14b

# 4-bit quantized 70B fits comfortably - no paging
ollama run llama3.3:70b-instruct-q4_K_M

Ollama and llama.cpp already have Windows-on-Arm builds. Both should work on RTX Spark at launch without any porting work, since the CUDA layer handles the GPU path the same way it does on x86.

The Nvidia RTX Spark superchip on display at GTC Taipei 2026 The RTX Spark die - Blackwell 2.0 GPU and 20-core Arm CPU on a single TSMC 3nm package. Source: servethehome.com

The Memory Architecture

128 GB Unified Pool

The 16-channel LPDDR5X bus gives the N1X 300 GB/s of memory bandwidth shared between CPU and GPU. Apple's M5 Max uses a similar unified memory model, though Apple doesn't publish the exact channel count. The 300 GB/s figure is competitive with Apple's disclosed bandwidth numbers and far above what any discrete GPU gets when bandwidth-limited by PCIe.

The N1 tier caps at 64 GB across a 8-channel bus. Both chips use the same NVLink C2C die-to-die interconnect, which Nvidia says adds negligible latency versus a monolithic design.

How It Compares

RTX Spark N1XApple M5 MaxQualcomm X Elite
CPU Cores20 (Arm)16 (Apple custom)12 (Oryon)
GPU Cores6,144 CUDA40-core GPUAdreno X1-85
Max Memory128 GB128 GB64 GB
Bandwidth300 GB/s~546 GB/s~273 GB/s
Neural Engine1 PFLOP (BF16, Nvidia)38 TOPS (Apple)75 TOPS (Qualcomm)
TDP45-80 W45-92 W45-80 W
LaunchFall 2026Available nowAvailable now

The Apple M5 Max's memory bandwidth advantage is real - Apple's custom memory controller is faster than what LPDDR5X delivers. The Neural Engine row uses different measurement methodologies (Nvidia's PFLOP figure is BF16 matrix throughput; Apple and Qualcomm publish INT8 TOPS). A direct apples-to-apples comparison isn't possible until third parties run the same benchmark across all three.

The Software Angle

CUDA on Arm Changes the Developer Story

Every CUDA workload that runs on a datacenter GPU runs on RTX Spark without rewriting a line of code. That's the argument Nvidia is making, and it's stronger than Qualcomm's or Apple's, both of which require porting to proprietary APIs (CoreML, Hexagon DSP) for GPU-accelerated AI inference.

PyTorch, TensorFlow, JAX, vLLM - these libraries have CUDA backends that work on Arm. Nvidia says they'll be available at launch. If that's accurate, RTX Spark becomes the first Windows laptop where a developer can pull a CUDA AI project from a Linux server, run it locally with no changes, and get GPU acceleration.

That's not a small thing. Right now the standard workflow for on-device AI development on Windows is to fall back to CPU or write DirectML code. RTX Spark could change that default.

RTX Spark-powered laptops from multiple OEM partners on display at Computex Partner laptops running RTX Spark from ASUS, Dell, Lenovo, and others, shown at Computex 2026. All ship this fall. Source: servethehome.com

The Roadmap

Nvidia outlined three generations of RTX Spark at Computex. The first generation - shipping this fall - uses LPDDR5X memory. The second generation, codenamed Rubin, moves to LPDDR6. A third generation, Rosa Feynman, follows after that. No dates for the later two.

The datacenter counterpart, the Vera CPU, is already in full production and shipping to Anthropic, OpenAI, and SpaceX. Huang said Vera creates tokens 1.8x faster than x86 today. The PC chip and the server chip share the same Arm architecture and CUDA software stack - which is exactly what Nvidia wants, since it collapses the PC-to-datacenter development gap.

Where It Falls Short

RTX Spark doesn't have pricing. No configurations have retail prices attached, which makes it impossible to compare cost against the M5 or Snapdragon X. Apple's M5 Max machines start around $2,499. If RTX Spark laptops land notably above that, the addressable market shrinks fast.

The Windows-on-Arm software library is still incomplete. Most mainstream x86 apps run via emulation, and emulated performance is slower. Apple Silicon has had four years to shake out app compatibility; RTX Spark starts from scratch on a different Arm implementation.

Nvidia also has no history of driver support for consumer PC chips. Its datacenter drivers are excellent. Its consumer GPU drivers have a long record of bugs, delayed releases, and inconsistent behavior on new hardware. Whether that changes for a platform where the company controls the whole stack remains to be seen.

The Intel vs. Arm debate on agentic AI silicon is far from settled. Intel's Clearwater Forest and AMD's next-gen mobile chips will have their own AI compute stories by the time RTX Spark ships. The competitive picture in Fall 2026 may look different from what Computex suggested.

Jensen Huang called it "the new PC." He says that about a lot of products. The CUDA-on-Arm bet is real, the specs are competitive, and the OEM lineup is serious. The question is whether the software ecosystem catches up before a second generation makes the first one obsolete.


Sources:

Sophie Zhang
About the author AI Infrastructure & Open Source Reporter

Sophie is a journalist and former systems engineer who covers AI infrastructure, open-source models, and the developer tooling ecosystem.