Nvidia Enters the PC Market With RTX Spark Superchip
Nvidia's RTX Spark packs 20 Arm CPU cores and a Blackwell 2.0 GPU with 6,144 CUDA cores into a 45-80W Windows laptop chip, targeting Apple Silicon head-on.

Nvidia has spent thirty years making the GPU that powers every AI cluster on the planet. On June 1, Jensen Huang announced the company is also coming for your laptop.
The RTX Spark - internally the N1X - is Nvidia's first System-on-Chip for Windows PCs. It combines a 20-core Arm CPU with a Blackwell 2.0 GPU on a single TSMC 3nm package, runs the full CUDA software stack, and ships this fall in laptops from Microsoft, Dell, HP, ASUS, Lenovo, and MSI. Over 30 laptop and 10 desktop designs are already confirmed. The thinnest will be 14 millimeters.
For Nvidia, the PC market is a new front. For Qualcomm, whose Windows-on-Arm exclusivity deal with Microsoft expired earlier this year, the timing is poor.
Key Specs - Nvidia RTX Spark (N1X)
| Component | Spec |
|---|---|
| CPU | 20 Arm cores (10x Cortex-X925 + 10x Cortex-A725) |
| GPU | Blackwell 2.0, 48 SMs, 6,144 CUDA cores |
| Memory | Up to 128 GB LPDDR5X, 16-channel |
| Bandwidth | 300 GB/s via NVLink C2C |
| AI Compute | 1 petaflop (claimed) |
| TDP | 45-80 W (full package) |
| Process | TSMC 3nm |
| Launch | Fall 2026 |
The CPU Half
20 Arm Cores, Two Classes
The N1X CPU die uses a classic big.LITTLE split: ten Cortex-X925 performance cores for single-threaded heavy lifting and ten Cortex-A725 efficiency cores for background tasks and power-sensitive workloads. The CPU and GPU dies connect via Nvidia's NVLink C2C interface at up to 300 GB/s - the same interconnect it uses in HGX AI server racks, now shrunk to fit in a laptop chassis.
The standard N1 (non-X) variant ships with 12 cores and 64 GB of LPDDR5X across a 8-channel bus. It targets the mid-range of the laptop market, where Qualcomm's Snapdragon X sits today.
TSMC 3nm - and What That Means for Thermals
Manufacturing on TSMC's 3nm node gives the N1X room to push GPU performance into the 45-80 W envelope without overheating thin chassis. The entire TDP figure covers both dies. That's the same power budget as a high-end Snapdragon X Elite, but with substantially more raw GPU compute underneath it.
Jensen Huang at Taipei Music Center on June 1, announcing Nvidia's entry into the Windows PC chip market.
Source: servethehome.com
The GPU Half
6,144 CUDA Cores - Same Count as RTX 5070
The Blackwell 2.0 GPU die packs 48 Streaming Multiprocessors, which works out to 6,144 CUDA cores - the same core count as Nvidia's desktop RTX 5070. Nvidia claims 1 petaflop of AI compute from the combination, which matches what you'd get from a standalone discrete GPU two product generations ago.
That GPU runs the full Blackwell feature set: DLSS 4.5 with Multi Frame Generation, ray tracing, and Reflex. For AI inference, it means the same CUDA libraries that run on H100s and B200s run on RTX Spark without recompilation.
Running Local Models on RTX Spark
The 128 GB unified memory pool is the spec that matters most for developers. A 4-bit quantized 70B model loads in seconds - the entire weight set fits with room for context. There's no discrete VRAM cliff to fall off.
# RTX Spark (N1X): 128 GB unified, full CUDA stack on Arm
# Install Ollama (Windows on Arm build ships at launch)
winget install Ollama.Ollama
# 14B models use under 10 GB - trivial on 128 GB
ollama run qwen3:14b
# 4-bit quantized 70B fits comfortably - no paging
ollama run llama3.3:70b-instruct-q4_K_M
Ollama and llama.cpp already have Windows-on-Arm builds. Both should work on RTX Spark at launch without any porting work, since the CUDA layer handles the GPU path the same way it does on x86.
The RTX Spark die - Blackwell 2.0 GPU and 20-core Arm CPU on a single TSMC 3nm package.
Source: servethehome.com
The Memory Architecture
128 GB Unified Pool
The 16-channel LPDDR5X bus gives the N1X 300 GB/s of memory bandwidth shared between CPU and GPU. Apple's M5 Max uses a similar unified memory model, though Apple doesn't publish the exact channel count. The 300 GB/s figure is competitive with Apple's disclosed bandwidth numbers and far above what any discrete GPU gets when bandwidth-limited by PCIe.
The N1 tier caps at 64 GB across a 8-channel bus. Both chips use the same NVLink C2C die-to-die interconnect, which Nvidia says adds negligible latency versus a monolithic design.
How It Compares
| RTX Spark N1X | Apple M5 Max | Qualcomm X Elite | |
|---|---|---|---|
| CPU Cores | 20 (Arm) | 16 (Apple custom) | 12 (Oryon) |
| GPU Cores | 6,144 CUDA | 40-core GPU | Adreno X1-85 |
| Max Memory | 128 GB | 128 GB | 64 GB |
| Bandwidth | 300 GB/s | ~546 GB/s | ~273 GB/s |
| Neural Engine | 1 PFLOP (BF16, Nvidia) | 38 TOPS (Apple) | 75 TOPS (Qualcomm) |
| TDP | 45-80 W | 45-92 W | 45-80 W |
| Launch | Fall 2026 | Available now | Available now |
The Apple M5 Max's memory bandwidth advantage is real - Apple's custom memory controller is faster than what LPDDR5X delivers. The Neural Engine row uses different measurement methodologies (Nvidia's PFLOP figure is BF16 matrix throughput; Apple and Qualcomm publish INT8 TOPS). A direct apples-to-apples comparison isn't possible until third parties run the same benchmark across all three.
The Software Angle
CUDA on Arm Changes the Developer Story
Every CUDA workload that runs on a datacenter GPU runs on RTX Spark without rewriting a line of code. That's the argument Nvidia is making, and it's stronger than Qualcomm's or Apple's, both of which require porting to proprietary APIs (CoreML, Hexagon DSP) for GPU-accelerated AI inference.
PyTorch, TensorFlow, JAX, vLLM - these libraries have CUDA backends that work on Arm. Nvidia says they'll be available at launch. If that's accurate, RTX Spark becomes the first Windows laptop where a developer can pull a CUDA AI project from a Linux server, run it locally with no changes, and get GPU acceleration.
That's not a small thing. Right now the standard workflow for on-device AI development on Windows is to fall back to CPU or write DirectML code. RTX Spark could change that default.
Partner laptops running RTX Spark from ASUS, Dell, Lenovo, and others, shown at Computex 2026. All ship this fall.
Source: servethehome.com
The Roadmap
Nvidia outlined three generations of RTX Spark at Computex. The first generation - shipping this fall - uses LPDDR5X memory. The second generation, codenamed Rubin, moves to LPDDR6. A third generation, Rosa Feynman, follows after that. No dates for the later two.
The datacenter counterpart, the Vera CPU, is already in full production and shipping to Anthropic, OpenAI, and SpaceX. Huang said Vera creates tokens 1.8x faster than x86 today. The PC chip and the server chip share the same Arm architecture and CUDA software stack - which is exactly what Nvidia wants, since it collapses the PC-to-datacenter development gap.
Where It Falls Short
RTX Spark doesn't have pricing. No configurations have retail prices attached, which makes it impossible to compare cost against the M5 or Snapdragon X. Apple's M5 Max machines start around $2,499. If RTX Spark laptops land notably above that, the addressable market shrinks fast.
The Windows-on-Arm software library is still incomplete. Most mainstream x86 apps run via emulation, and emulated performance is slower. Apple Silicon has had four years to shake out app compatibility; RTX Spark starts from scratch on a different Arm implementation.
Nvidia also has no history of driver support for consumer PC chips. Its datacenter drivers are excellent. Its consumer GPU drivers have a long record of bugs, delayed releases, and inconsistent behavior on new hardware. Whether that changes for a platform where the company controls the whole stack remains to be seen.
The Intel vs. Arm debate on agentic AI silicon is far from settled. Intel's Clearwater Forest and AMD's next-gen mobile chips will have their own AI compute stories by the time RTX Spark ships. The competitive picture in Fall 2026 may look different from what Computex suggested.
Jensen Huang called it "the new PC." He says that about a lot of products. The CUDA-on-Arm bet is real, the specs are competitive, and the OEM lineup is serious. The question is whether the software ecosystem catches up before a second generation makes the first one obsolete.
Sources:
