Alibaba's C950 - First RISC-V CPU with Native LLM Inference
Alibaba's T-Head division launched the XuanTie C950, a 5nm 3.2GHz RISC-V server chip that sets a new world record for RISC-V single-core performance and natively runs billion-parameter models like DeepSeek V3 and Qwen3.

At a RISC-V ecosystem conference in Shanghai on March 24, Alibaba's T-Head semiconductor division unveiled the XuanTie C950 - a 5nm server CPU that the company claims sets a new world record for RISC-V single-core performance and, for the first time on this architecture, runs billion-parameter language models completely on CPU.
This isn't a GPU competitor. It's something more specific: a CPU-centric path to AI inference that doesn't touch Nvidia's IP, doesn't require Arm licenses, and runs on an architecture that China can develop and manufacture without asking anyone's permission.
Key Specs
| Spec | Value |
|---|---|
| Process node | 5nm |
| Clock speed | 3.2 GHz |
| Decode width | 8 instructions |
| Pipeline depth | 16 stages |
| SPECint2006 single-core | 70+ (RISC-V world record) |
| Performance vs C920 | 3x overall |
| Memory bandwidth vs C920 | 4x |
| AI engine | Vector + Matrix Acceleration (integrated) |
| LLM support | Qwen3, DeepSeek V3 (billion-parameter, native hardware) |
Under the Hood
Core Architecture
The C950 is built on a 5nm process at 3.2 GHz with an 8-instruction decode width and a 16-stage pipeline. Those are competitive specs for a server CPU in 2026 - not leading-edge by TSMC N3 standards, but the kind of build that translates directly to low-latency, high-throughput inference.
T-Head's previous server-grade chip, the C930, launched in February 2025 and began shipping a month later. The C920, its predecessor, has been in wide deployment since 2024. The C950 is a two-year development effort and the first chip in the XuanTie line designed from the ground up with LLM inference as a first-class workload.
RISC-V is a fully open-source instruction set architecture - there are no licensing fees, no US-entity IP dependencies, and no export-control exposure from the ISA itself. That architectural independence is part of why Chinese companies have invested heavily in RISC-V development. T-Head has now shipped over 470,000 AI chips, with annual revenue approaching $1.45 billion, and the C950 is the flagship of that portfolio.
AI Acceleration Engine
The chip integrates two purpose-built accelerators: a Vector Acceleration Engine and a Matrix Acceleration Engine, both co-designed with the CPU cores rather than bolted on. The matrix engine handles the tensor operations that dominate transformer inference. The vector engine handles the embedding lookups, attention heads, and intermediate activations.
The C950 is the first RISC-V processor to natively support billion-parameter LLM inference at the hardware level. "Native" here matters: the chip doesn't just run these models via software emulation. The instruction set extensions and hardware units are designed to execute the core operations of models like Qwen3 and DeepSeek V3 without ISA translation overhead.
It also runs standard cloud workloads - MySQL, Redis, Nginx, OpenSSL - so the same chip can serve both inference and the surrounding infrastructure in a CPU-only server rack.
The XuanTie C950 debuted at Alibaba DAMO Academy's ecosystem conference in Shanghai on March 24, 2026.
Source: scmp.com
What the Numbers Say
The SPECint2006 Record
The C950 scores over 70 points on SPECint2006 single-core. That's a new world record for RISC-V architecture, though the benchmark itself deserves some context: SPECint2006 was retired by the SPEC consortium years ago, and it measures integer performance on a fixed workload suite that doesn't directly predict LLM throughput. It's still widely used as a cross-architecture comparison tool, and the score represents a 3x improvement over the C920.
Memory bandwidth increased by 4x compared to the C920. For transformer inference, memory bandwidth is often the binding constraint - a 4x improvement there translates more directly to throughput gains than the raw SPECint score does.
T-Head's predecessor, the C930, topped 15 points per GHz on SPECint2006. The C950 clears 70 total at 3.2 GHz, which puts it at roughly 22 points per GHz - a meaningful step up in per-clock efficiency.
LLM Inference Claims
T-Head hasn't published specific tokens-per-second figures for the C950 on Qwen3 or DeepSeek V3. The claim at the conference was that the chip reaches "first-time native support for billion-parameter LLM inference on RISC-V," but the performance numbers that would let a developer decide whether this is competitive with, say, a mid-range Nvidia GPU haven't appeared yet.
Inference on a CPU is slower than on a GPU for most production workloads. The C950's advantage isn't raw throughput - it's cost per server slot, power efficiency for sustained inference, and the absence of GPU supply constraints that have repeatedly disrupted AI infrastructure buildouts. For developers building applications around open-source models like DeepSeek V3 or Qwen3, a CPU inference path that actually works at scale is worth tracking.
Alibaba's T-Head division launched the C950 with the companion C925 efficiency chip at the same event.
Source: pandaily.com
Compatibility and Ecosystem
| Workload | C950 Support | Notes |
|---|---|---|
| Qwen3 inference | Native hardware | Explicitly supported at launch |
| DeepSeek V3 inference | Native hardware | Explicitly supported at launch |
| General cloud (MySQL, Redis, Nginx) | Full | Standard server workload support |
| OpenSSL / secure enclaves | Yes | Native confidential computing built in |
| LLM training (large scale) | Not targeted | GPU clusters remain the standard |
| GPU-style tensor parallelism | No | Single-CPU inference only at present |
The chip is fully compliant with both mandatory and optional RISC-V extension instruction sets, which means existing RISC-V software can run on it without modification. Alibaba announced the C950 core design is available for licensing to IC developers - consistent with how T-Head has distributed prior XuanTie cores, which now power nearly 1,000 different devices across servers, robotics, and electric vehicles.
The RISC-V Factor
The C950 landed with a statement from Ni Guangnan, a member of the Chinese Academy of Engineering, who noted that RISC-V now accounts for roughly 25% of the global processor market. SHD Group projects 36 billion RISC-V device shipments by 2031, with market value exceeding $300 billion.
That context matters for what the C950 represents strategically. Chinese open-source LLMs captured approximately 30% global market share in 2026, up from 1.2% in 2024. The models are there. The training infrastructure is developing - see China's $70 billion chip subsidy program. What's been missing is a domestic inference chip that can run those models without routing through Nvidia hardware constrained by US export controls.
Huawei's Ascend chips target the same problem via a different architecture - GPU-style NPUs with a proprietary software stack. T-Head's bet is that a CPU-first approach, built on a royalty-free open ISA, is more deployable at scale because it drops into existing server racks without specialized interconnects or a new software ecosystem. Whether that bet is right depends on whether the throughput numbers, when they come, are anywhere near competitive.
For the comparison at the infrastructure layer: custom silicon designed for AI inference isn't a new idea. Amazon's Trainium program took years of internal development before producing chips that could meaningfully displace GPU capacity in Amazon's own datacenters. T-Head is at a similar inflection point with the C950 - the architecture exists, the software support for key models is claimed, and deployment at scale is the next test.
Where It Falls Short
Three things stand out.
No throughput benchmark. The SPECint2006 score confirms that the CPU is fast. It doesn't tell you how long it takes to produce 1,000 tokens from DeepSeek V3. That number is what actually matters for inference deployments, and it hasn't been published.
CPU inference has a ceiling. A C950 runs a single inference thread per socket. Modern GPU inference clusters run hundreds of batched requests simultaneously. For high-concurrency production inference - the kind that serves a public API at scale - the parallelism advantage of GPU architectures is substantial. A CPU chip that runs Qwen3 is useful for edge deployment, private inference, or cost-sensitive small-scale use cases. It doesn't displace a rack of Nvidia H100s for a major API endpoint.
No commercial availability date. T-Head announced the chip and its licensing program. Shipping dates weren't disclosed. The C930 launched in February 2025 and started shipping a month later; if the C950 follows a similar pattern, units may reach customers in April or May. For now it's a benchmark and an architecture claim.
The C950 is the most capable RISC-V server chip anyone has announced. That's a meaningful milestone in an architecture that most Western AI infrastructure teams have largely ignored. Whether it turns into deployed inference capacity depends on numbers T-Head hasn't released yet.
Sources:
