TL;DR

Huawei's flagship AI accelerator - the most powerful AI chip designed and manufactured entirely within China
96GB HBM2e memory with ~1,800 GB/s bandwidth, a substantial upgrade over the 910B's 64GB and ~1,200 GB/s
Estimated ~800 TFLOPS FP16 compute - roughly comparable to an NVIDIA A100 but trailing the H100 significantly
Built on SMIC's 7nm process node under US export control constraints that block access to EUV lithography
Powers an expanding Chinese AI ecosystem including DeepSeek, Baidu, Alibaba, and ByteDance workloads

Overview

The Huawei Ascend 910C is the most capable AI accelerator that China can currently design and manufacture domestically. That sentence carries two meanings, and both matter. It is genuinely impressive hardware - 96GB of HBM2e, an estimated 800 TFLOPS of FP16 compute, and a growing software ecosystem. It is also constrained by US export controls that deny Huawei access to TSMC's advanced nodes and EUV lithography equipment, forcing SMIC's 7nm DUV process to carry the entire manufacturing burden.

Released in H2 2024, the 910C represents a meaningful generational improvement over the Ascend 910B. Memory capacity jumps from 64GB to 96GB HBM2e, bandwidth increases from approximately 1,200 GB/s to 1,800 GB/s, and compute performance sees an estimated 30-35% improvement. These gains come from architectural optimization within the same SMIC 7nm process - Huawei cannot simply move to a smaller node the way AMD and NVIDIA can with TSMC's 3nm and 5nm processes. Every performance gain has to come from better circuit design, packaging, and architectural choices rather than transistor shrinks.

The 910C matters geopolitically as much as it matters technically. China's ability to train and deploy frontier AI models depends on domestic hardware supply. DeepSeek's V4 was reportedly optimized for Ascend chips rather than NVIDIA GPUs - a deliberate choice to prove that Chinese AI can be built on Chinese silicon. Baidu, Alibaba, Tencent, and ByteDance have all committed to Ascend-based infrastructure to varying degrees. The 910C is not competing with the H100 on pure performance - it is building an alternative ecosystem that can function independently of American technology. Whether that ecosystem can sustain frontier AI development without access to cutting-edge manufacturing is the trillion-dollar question.

Key Specifications

Specification	Details
Manufacturer	Huawei (HiSilicon)
Product Family	Ascend 910
Architecture	Da Vinci (enhanced)
Process Node	SMIC 7nm (N+2)
Chip Type	ASIC
AI Cores	32 Da Vinci cores (estimated)
FP16 Performance	~800 TFLOPS (estimated)
BF16 Performance	~800 TFLOPS (estimated)
INT8 Performance	~1,600 TOPS (estimated)
Memory	96GB HBM2e
Memory Stacks	6x HBM2e (estimated)
Memory Bandwidth	~1,800 GB/s
Interconnect	HCCS (Huawei Cache Coherence System)
TDP	600W
Form Factor	Proprietary module
Software Stack	CANN (Compute Architecture for Neural Networks)
Target Workload	Training and Inference
Release Date	H2 2024
Estimated Price	$12,000-$18,000

Note: Huawei does not publish detailed specifications for the Ascend 910C. The numbers above are based on leaked documentation, third-party testing, analyst estimates, and inference from Huawei's marketing materials. Actual specifications may differ.

Performance Benchmarks (Estimated)

Benchmark / Metric	Ascend 910C	Ascend 910B	NVIDIA H100 SXM	NVIDIA A100 80GB
FP16 Peak (TFLOPS)	~800	~600	990	312
Memory Capacity	96GB	64GB	80GB	80GB
Memory Bandwidth	~1,800 GB/s	~1,200 GB/s	3,350 GB/s	2,039 GB/s
LLM Inference (relative)	~0.6-0.7x H100	~0.4-0.5x H100	1.0x (baseline)	~0.5-0.6x H100
Training Throughput (relative)	~0.5-0.6x H100	~0.3-0.4x H100	1.0x (baseline)	~0.5x H100
Power (TDP)	600W	400W	700W	400W
Price (estimated)	$12,000-$18,000	$8,000-$12,000	$25,000-$40,000	$10,000-$15,000

These relative performance figures are approximate. The 910C's performance relative to the H100 varies significantly by workload type, model architecture, and how well the model has been optimized for Huawei's CANN software stack. Workloads that have been specifically optimized for Ascend hardware - as DeepSeek's models reportedly are - can achieve higher utilization than generic ports from CUDA.

The memory bandwidth gap is the most significant technical limitation. At ~1,800 GB/s, the 910C has roughly 54% of the H100's 3,350 GB/s. For memory-bandwidth-bound inference workloads (which most LLM serving is), this directly translates to lower tokens-per-second per chip. The 910C partially compensates with its larger 96GB memory, which can reduce the need for multi-chip configurations on some model sizes.

Key Capabilities

Domestic Manufacturing Independence. The 910C's most important capability is not a spec - it is the fact that it exists at all. Manufactured on SMIC's 7nm process using DUV (deep ultraviolet) lithography rather than the EUV lithography that TSMC, Samsung, and Intel use for their most advanced nodes, the 910C demonstrates that China can produce competitive (if not leading-edge) AI accelerators despite US export controls. The DUV approach requires multi-patterning techniques that increase cost and reduce yield, but Huawei and SMIC have made it work at production scale. This gives Chinese AI companies a supply chain that no US policy action can disrupt.

CANN Software Ecosystem. Huawei's CANN (Compute Architecture for Neural Networks) is the software stack that makes Ascend hardware programmable. It includes a neural network compiler, operator libraries, and framework adapters for PyTorch, TensorFlow, and MindSpore (Huawei's own framework). CANN has improved significantly since the 910B generation, with better PyTorch compatibility and more optimized operators. DeepSeek's decision to optimize V4 for Ascend hardware has forced rapid maturation of the CANN stack, because frontier model training is the most demanding test of any AI software platform. The CANN ecosystem is still far smaller than CUDA's, but it is now production-grade for Transformer workloads.

96GB HBM2e Memory. The increase from the 910B's 64GB to 96GB is significant for inference workloads. A 70B-parameter model in FP16 requires approximately 140GB - too large for a single 910C, but the 96GB capacity means that with INT8 quantization (which halves memory requirements to ~70GB), the model fits on a single chip. The 910B's 64GB could not manage this even with INT8. For Chinese inference providers serving domestic models like Qwen, Yi, and DeepSeek, the 910C's memory capacity meaningfully expands the set of models that can be served with minimal tensor parallelism.

Pricing and Availability

Huawei does not publish official pricing for the Ascend 910C. Market estimates place the price at $12,000 to $18,000 per accelerator, though actual pricing varies significantly based on volume, customer relationship, and whether the purchase includes Huawei's Atlas server platform or standalone accelerators.

Accelerator	Estimated Price	Memory	FP16 TFLOPS	Process Node
Huawei Ascend 910C	$12,000-$18,000	96GB	~800	SMIC 7nm
Huawei Ascend 910B	$8,000-$12,000	64GB	~600	SMIC 7nm
NVIDIA H100 SXM	$25,000-$40,000	80GB	990	TSMC 4nm
NVIDIA A100 80GB	$10,000-$15,000	80GB	312	TSMC 7nm
AMD MI300X	$10,000-$15,000	192GB	1,307	TSMC 5nm/6nm

The 910C is available primarily through Huawei's Atlas server platforms and through Chinese cloud providers including Huawei Cloud, Alibaba Cloud (for government and enterprise customers), and various state-backed data center operators. Export restrictions make the 910C effectively unavailable outside of China and a limited number of countries.

Availability within China has reportedly been strong, with Huawei ramping production through 2025. Several Chinese hyperscalers have placed large orders, and the Chinese government has provided subsidies and procurement guarantees that ensure demand. The supply constraint is not Huawei's assembly capacity but SMIC's 7nm wafer output, which remains limited compared to TSMC's advanced nodes.

Architecture Deep Dive

The 910C's architecture is a study in optimization under constraint. Huawei's HiSilicon design team cannot access TSMC's advanced nodes, EUV lithography, or the latest packaging technologies that AMD and NVIDIA use. Every transistor must be manufactured on SMIC's 7nm DUV (deep ultraviolet) process, which limits transistor density, clock speed, and power efficiency relative to TSMC 4nm or 3nm. The 910C's architectural gains over the 910B come entirely from circuit design, memory integration, and packaging improvements within this constrained process.

Da Vinci Core Architecture. The Ascend 910 series uses Huawei's Da Vinci AI core architecture, which is organized around a 3D Cube Computing Engine - a systolic array-like structure optimized for matrix multiplication. Each Da Vinci core contains:

Component	Function	Estimated Specs (910C)
Cube Unit	Dense matrix multiplication (FP16, BF16, INT8)	16x16x16 cube, ~25 TFLOPS FP16 per core
Vector Unit	Element-wise operations, activation functions, normalization	256-bit vector width (est.)
Scalar Unit	Control flow, addressing, scalar math	Standard RISC pipeline
Local Buffer	On-chip SRAM for operand staging	~512KB per core (est.)

The 910C is estimated to contain 32 Da Vinci cores (same count as the 910B), but with each core running at higher clock speeds and improved microarchitecture. The per-core throughput improvement is estimated at 30-35%, which - across 32 cores - accounts for the chip-level improvement from ~600 TFLOPS to ~800 TFLOPS FP16.

Memory Subsystem Upgrade. The memory upgrade from 910B to 910C is arguably more significant than the compute improvement:

Memory Property	Ascend 910B	Ascend 910C	Improvement
Capacity	64GB HBM2e	96GB HBM2e	+50%
Stacks	4x HBM2e	6x HBM2e (est.)	+50%
Per-Stack Bandwidth	~300 GB/s	~300 GB/s	Same (same HBM2e gen)
Aggregate Bandwidth	~1,200 GB/s	~1,800 GB/s	+50%
HBM Generation	HBM2e	HBM2e	Same
Memory Controller	Da Vinci gen 1	Da Vinci gen 2 (est.)	Improved efficiency

The bandwidth increase is proportional to the stack count increase - Huawei added more HBM2e stacks rather than moving to a faster HBM generation. This is a practical decision given supply chain constraints: HBM3 stacks from SK Hynix and Samsung are subject to US export controls and may not be available to Huawei. Using more HBM2e stacks that can be sourced domestically or from non-restricted suppliers is a more reliable approach, even if it means forgoing the per-stack bandwidth improvements of HBM3.

HCCS Interconnect. Huawei's HCCS (Huawei Cache Coherence System) provides chip-to-chip communication for multi-accelerator configurations. Details are sparse, but available information suggests:

Interconnect Property	HCCS (910C)	NVLink 4.0 (H100)	Infinity Fabric (MI300X)
Bidirectional Bandwidth (per chip)	~400 GB/s (est.)	900 GB/s	896 GB/s
Max Connected Chips	8 (in Atlas 900 node)	8 (DGX H100)	8 (MI300X OAM platform)
Topology	Ring/mesh (est.)	NVSwitch crossbar	Point-to-point mesh
Inter-node	Proprietary (RoCE-based)	InfiniBand/Ethernet	InfiniBand/Ethernet

The HCCS bandwidth gap is significant. At approximately 400 GB/s, the 910C's interconnect provides less than half the bandwidth of NVLink or Infinity Fabric. For workloads that require heavy all-reduce communication (large-batch distributed training), this bottleneck limits scaling efficiency beyond a single node. DeepSeek and others have reportedly worked around this limitation through communication-efficient training algorithms that reduce the frequency and volume of inter-chip data movement.

SMIC 7nm Process Constraints. SMIC's 7nm (marketed as "N+2") process uses DUV multi-patterning rather than EUV single-patterning. The practical implications:

Property	SMIC 7nm (DUV)	TSMC 4nm (EUV)	Impact on 910C
Transistor Density	~90 MTr/mm2 (est.)	~130 MTr/mm2	~30% fewer transistors per mm2
Max Practical Die Size	~400 mm2 (est.)	~800 mm2	Limits on-chip resources
Power Efficiency	~0.7x TSMC 4nm (est.)	Baseline	Higher watts per TFLOP
Yield (large dies)	Lower	Higher	Higher per-chip cost
Clock Speed	Limited	Higher achievable	Limits per-core throughput

These constraints mean that the 910C simply cannot match the transistor count, clock speed, or power efficiency of TSMC-manufactured chips in the same generation. Huawei compensates through architectural optimization and by accepting higher power consumption per TFLOP - the 910C's 600W TDP is moderate in absolute terms but delivers fewer TFLOPS per watt than the H100 on compute-bound workloads.

Atlas Server Platform Integration. The 910C is deployed through Huawei's Atlas series server platforms, which are purpose-built for Ascend accelerators:

Platform	Ascend Chips	Total Memory	Use Case
Atlas 800 Training Server	8x 910C	768GB HBM2e	Distributed training
Atlas 800 Inference Server	8x 910C	768GB HBM2e	High-throughput inference
Atlas 900 AI Cluster	64x 910C (8 nodes)	6TB HBM2e	Large-scale pre-training
Atlas 200 Edge Module	1x 910C (or lower)	96GB HBM2e	Edge inference (limited)

The Atlas platform provides integrated cooling, power delivery, and HCCS interconnect backplane - similar to how NVIDIA's DGX systems integrate NVLink and NVSwitch. The vertical integration means that purchasing 910C chips outside of Atlas platforms is difficult, tying hardware decisions to Huawei's server ecosystem.

Comparison with Original Ascend 910 (Pre-Sanctions). The original Ascend 910, announced in 2019, was designed for TSMC 7nm manufacturing. When US sanctions blocked TSMC access, Huawei redesigned the chip for SMIC's 7nm DUV process. The 910B and 910C represent this redesigned lineage:

Property	Ascend 910 (original, TSMC)	Ascend 910B (SMIC)	Ascend 910C (SMIC)
Process	TSMC 7nm EUV	SMIC 7nm DUV	SMIC 7nm DUV
FP16 Compute	~256 TFLOPS	~600 TFLOPS	~800 TFLOPS
Memory	32GB HBM2e	64GB HBM2e	96GB HBM2e
Status	Limited production	Mass production	Mass production
TDP	310W	400W	600W

The architectural improvements from 910 to 910B/910C are substantial and demonstrate that HiSilicon's design team has continued to improve the Da Vinci architecture despite the process node constraint. The 910C delivers approximately 3x the compute of the original 910 on the same effective process generation - a testament to microarchitectural optimization.

Real-World Performance Analysis

Training Performance on Chinese Frontier Models. The 910C's most important performance validation comes from its use in training Chinese frontier models. While exact numbers are scarce, leaked benchmarks and industry reports provide approximate comparisons:

Training Workload	Ascend 910C (est.)	NVIDIA H100 SXM	Performance Ratio
Qwen 72B (tokens/sec/chip)	~800-1,000	~1,600-2,000	~0.5x
DeepSeek-style MoE (per chip)	~900-1,100	~1,800-2,200	~0.5x
GPT-3 175B (per chip)	~600-800	~1,400-1,700	~0.4-0.5x
BERT Large fine-tuning	~400-500 samples/s	~800-1,000 samples/s	~0.5x

The consistent ~0.5x ratio to the H100 on training workloads reflects the combination of lower compute (800 vs 990 TFLOPS FP16) and significantly lower memory bandwidth (1,800 vs 3,350 GB/s). For memory-bandwidth-bound operations like attention computation in long-context training, the gap widens further.

Inference Performance. For inference serving, the 910C's 96GB memory is its most valuable asset:

Model	Precision	910C Throughput (est.)	H100 Throughput	Notes
Qwen 72B	INT8 (~70GB)	~15-20 tok/s	~45-55 tok/s	Single chip, 910C memory-bandwidth limited
ChatGLM-6B	FP16 (~12GB)	~80-100 tok/s	~200-250 tok/s	Small model, compute-limited
DeepSeek V3 (MoE)	INT8	Multi-chip	Multi-chip	Both require distributed serving
Yi-34B	INT8 (~34GB)	~25-35 tok/s	~60-80 tok/s	Single chip on both

The pattern is consistent: the 910C delivers approximately 35-45% of H100 inference throughput per chip. The bandwidth bottleneck (1,800 vs 3,350 GB/s) is the primary limiter for the autoregressive decode phase of LLM inference, which is dominated by memory reads.

CANN Software Stack Performance. The CANN ecosystem has matured substantially between the 910B and 910C generations. Key framework support status:

Framework/Library	CANN Support Status	Notes
PyTorch (via Ascend adapter)	Production-grade	Most operators supported, some gaps remain
MindSpore (Huawei native)	Full support	Best-optimized path for Ascend hardware
TensorFlow	Functional	Less investment than PyTorch/MindSpore
vLLM	Community port (experimental)	Not officially supported, limited features
DeepSpeed	Adapted (DeepSeek fork)	Modified for HCCS communication patterns
ONNX Runtime	Functional	CANN execution provider available
Flash Attention equivalent	CANN-native implementation	Lower performance than CUDA Flash Attention

The biggest CANN gap versus CUDA is in inference serving engines. vLLM, TensorRT-LLM, and SGLang - the workhorses of Western inference infrastructure - either do not support CANN or have experimental support only. Chinese inference providers have built custom serving solutions on top of CANN's lower-level APIs, but these are not open-source and not transferable across organizations.

Scaling Efficiency at Cluster Scale. For training workloads that require hundreds or thousands of 910C chips, scaling efficiency becomes the critical metric. The HCCS interconnect's lower bandwidth means that communication overhead grows faster with chip count than on NVLink-based systems:

Cluster Size	Estimated Scaling Efficiency (910C)	Estimated Scaling Efficiency (H100)	Gap
8 chips (1 node)	~90-95%	~95-98%	Small
64 chips (8 nodes)	~75-85%	~88-93%	Moderate
256 chips (32 nodes)	~60-75%	~80-88%	Significant
1,024 chips (128 nodes)	~45-60%	~70-82%	Large
2,048+ chips	~35-50%	~65-78%	Very large

These are rough estimates that vary significantly by workload. MoE (Mixture of Experts) architectures like DeepSeek V3 can achieve better scaling efficiency because they require less all-reduce communication per training step. This partly explains why DeepSeek chose the MoE architecture for their frontier models - it is inherently more communication-efficient, which plays to the 910C's strengths relative to its interconnect limitations.

Domestic HBM Supply Chain. One of the most significant supply chain risks for the 910C is memory. HBM2e stacks are manufactured primarily by SK Hynix and Samsung, both South Korean companies that are subject to US pressure regarding technology exports to China. Huawei has been working to qualify domestic HBM alternatives from Chinese memory manufacturers including CXMT (ChangXin Memory Technologies), but domestic HBM production is estimated to be 2-3 generations behind Korean manufacturers. If HBM2e exports to Huawei are restricted in the future, the 910C's memory subsystem could face supply constraints that limit production volume.

Generational and Competitive Context

vs. Huawei Ascend 910B. The 910C is a meaningful but evolutionary upgrade over the 910B. The 50% memory increase (64GB to 96GB) and 50% bandwidth increase (1,200 to 1,800 GB/s) are the most impactful improvements for inference workloads. The ~33% compute improvement matters for training. For existing 910B operators, the 910C is a straightforward upgrade - CANN applications targeting the 910B run on the 910C with recompilation. New deployments in China should default to the 910C unless budget constraints push toward the cheaper 910B.

vs. NVIDIA H100 SXM. The 910C is not and will never be an H100 equivalent. It delivers roughly 50-60% of the H100's performance at 50-60% of the H100's price (if the H100 were available in China, which it is not). The comparison is academic for Chinese organizations because the H100 is not an option. The relevant question is whether the 910C is good enough to train and serve competitive models - and DeepSeek's results demonstrate that it is, when combined with sufficient scale and software optimization.

vs. AMD MI300X. The MI300X outperforms the 910C in every measurable dimension: 2x memory (192GB vs 96GB), 3x bandwidth (5,300 vs 1,800 GB/s), and ~1.6x compute (1,307 vs ~800 TFLOPS FP16). The MI300X is also less expensive on a per-TFLOP basis. But the MI300X is subject to US export controls and unavailable in China, making this comparison purely theoretical for the 910C's target market.

vs. Google TPU v6e Trillium. The TPU v6e is cloud-only and available exclusively on Google Cloud - completely inaccessible to Chinese organizations. Architecturally, Trillium's 32GB per chip is less than the 910C's 96GB, but Trillium's pod-scale ICI interconnect at 256 chips provides a very different scaling story. These products serve entirely separate markets.

Export Control Implications. The 910C exists because of - not despite - US export controls. The October 2022 controls (updated in October 2023) banned the sale of accelerators above specific performance thresholds to Chinese entities. This forced Chinese AI companies to choose between stockpiling pre-ban NVIDIA chips (which many did), slowing down AI development, or adopting domestic alternatives like the Ascend line. The 910C's specifications are carefully designed to provide maximum performance within what SMIC's 7nm process can deliver. Any future tightening of controls (for example, restrictions on HBM2e sales to China) could impact the 910C's memory subsystem, though Huawei has been working to qualify domestic HBM suppliers.

DeepSeek's Ascend Strategy. DeepSeek's decision to optimize V4 for Ascend hardware is the most significant validation of the 910C ecosystem. DeepSeek's approach involved several key innovations that extract maximum performance from Ascend constraints:

Communication-efficient training algorithms that reduce all-reduce frequency, minimizing HCCS bandwidth limitations
Custom operator implementations in CANN that bypass generic framework overhead
Mixed-precision strategies optimized for the Da Vinci core's FP16/INT8 capabilities
Memory-efficient attention implementations that work within the 96GB budget

These optimizations are not transferable to other hardware, which deepens the Ascend ecosystem lock-in but also proves that the hardware is capable when software is purpose-built for it.

Future Ascend Roadmap. Huawei has not disclosed detailed roadmap information for next-generation Ascend chips. Industry analysts expect a 910D or next-generation Ascend within 12-18 months, likely on an improved SMIC process (possibly 5nm DUV, though SMIC's 5nm readiness is unconfirmed). Key targets for the next generation would include:

HBM3 or equivalent memory for higher bandwidth (addressing the 1,800 GB/s bottleneck)
FP8 hardware support (closing the gap with NVIDIA and AMD)
Improved HCCS interconnect bandwidth (addressing the scaling efficiency gap)
Higher core count or clock speed for increased compute throughput

Whether Huawei can deliver on these targets depends largely on SMIC's process development progress and HBM supply chain availability - both factors that are outside Huawei's direct control.

Use Case Recommendations

Strong Fit:

Chinese AI companies training domestic models. If you are operating within China and need to train models at 70B+ scale, the 910C is the most capable domestic option. The 96GB memory and ~800 TFLOPS FP16 provide sufficient compute for frontier model training when scaled to hundreds or thousands of chips.
Chinese cloud providers building inference capacity. For serving Qwen, Yi, DeepSeek, and other domestic models to Chinese users, the 910C's 96GB memory enables single-chip INT8 serving of 70B-class models. The total cost of ownership is competitive given that NVIDIA alternatives are not available.
Government and military AI programs. For applications where supply chain sovereignty is non-negotiable, the 910C - manufactured entirely in China - is the only option that provides complete independence from Western technology supply chains.
Organizations that have already invested in CANN. If your engineering team has already built CANN expertise on the 910B, the 910C offers a smooth upgrade path with immediate performance gains and no software rewrite.

Weak Fit:

Organizations outside China. The 910C is effectively unavailable outside China. Even if you could procure units, the CANN ecosystem is optimized for Chinese frameworks and Chinese-language documentation. Western organizations should look at AMD MI300X, NVIDIA H100, or Google TPU alternatives.
Workloads requiring maximum memory bandwidth. At 1,800 GB/s, the 910C's bandwidth bottleneck limits LLM inference throughput to roughly half of what an H100 or MI300X can achieve. If per-chip inference throughput is the critical metric, the 910C underperforms.
Teams without CANN expertise. The CANN learning curve is steep, especially for teams coming from CUDA. English-language documentation is limited, community resources are sparse, and debugging tools are less mature than CUDA's. Budget significant onboarding time.
Applications requiring FP8 precision. The 910C does not have confirmed FP8 hardware support. If your inference pipeline relies on FP8 quantization for throughput (as most modern serving engines do), you will be limited to INT8 or FP16 on the 910C, with lower effective throughput than FP8-capable alternatives.
Multi-modal AI workloads. Vision-language models, audio processing, and other multi-modal pipelines have less CANN optimization than pure language model workloads. The operator coverage for vision encoders, audio feature extractors, and cross-modal attention on CANN is thinner than for standard Transformer layers.

Decision Framework for Chinese Organizations. For AI teams in China choosing between the 910B and 910C:

Deciding Factor	Choose 910B	Choose 910C
Budget per chip	Under $12,000	$12,000-$18,000 acceptable
Target model size (inference)	Sub-34B INT8	34B-70B INT8
Cluster scale needed	1,000+ chips (cost matters)	Under 500 chips (performance per chip matters)
Existing infrastructure	Expanding 910B clusters	New deployment
Primary workload	Training (large batch)	Inference (memory-bound)
Power constraints	Limited facility power	Power budget available
Timeline	Need hardware now	Can wait for production ramp

The 910C is the default choice for new deployments in 2025-2026. The 910B remains relevant for budget-constrained expansions of existing clusters and for training workloads where per-chip performance matters less than aggregate throughput at lower total cost.

Strategic Importance and Investment Trajectory. The Chinese government has committed significant resources to domestic AI chip development. National and provincial subsidies for Ascend hardware procurement reduce the effective cost for Chinese organizations, making the 910C's price-performance equation more favorable than the list pricing suggests. The "New Generation AI Development Plan" and related policy initiatives treat domestic AI chip adoption as a national priority, which provides demand certainty that supports Huawei's continued R&D investment in the Ascend line. This government backing is a structural advantage that no Western AI chip competitor can replicate - it provides a guaranteed customer base regardless of competitive performance gaps.

For the broader global AI industry, the 910C represents a significant data point. It demonstrates that export controls have not halted Chinese AI hardware development - they have redirected it toward domestic alternatives. The performance gap between the 910C and the H100 (roughly 0.5-0.6x) is smaller than many analysts predicted when controls were first imposed. Whether this gap will narrow or widen in future generations depends on SMIC's process development progress and Huawei's ability to source advanced memory and packaging technologies under ongoing restrictions.

Strengths

96GB HBM2e - the highest memory capacity in the Ascend line, enabling INT8 serving of 70B-class models on a single chip
Estimated ~800 TFLOPS FP16 - roughly 2x the A100 and competitive for many training workloads
Manufactured entirely within China, immune to US export control disruptions
CANN software stack has matured significantly with DeepSeek and other frontier model training
Growing domestic ecosystem with support from Baidu, Alibaba, Tencent, ByteDance, and DeepSeek
600W TDP is moderate compared to H100's 700W and MI300X's 750W
Strategic importance ensures continued Chinese government investment and subsidies

Weaknesses

Memory bandwidth (~1,800 GB/s) is roughly 54% of the H100's 3,350 GB/s - a significant bottleneck for inference
SMIC 7nm process trails TSMC by 2+ technology generations, limiting transistor density and power efficiency
CANN software ecosystem is far smaller than CUDA - fewer libraries, tools, and community resources
No FP8 hardware support confirmed, limiting performance on quantized inference workloads
Higher estimated price-per-TFLOP than the AMD MI300X despite lower absolute performance
Effectively unavailable outside China due to export restrictions and supply priorities
Multi-chip interconnect (HCCS) bandwidth is not competitive with NVLink or Infinity Fabric for large-scale training

Huawei Ascend 910B - The predecessor 910B used for DeepSeek training
AMD Instinct MI300X - AMD's 192GB data center GPU at a similar price point
AMD Instinct MI350X - AMD's next-gen CDNA 4 accelerator
Google TPU v6e Trillium - Google's cloud-only TPU alternative
Google TPU v7 Ironwood - Google's next-gen inference TPU
DeepSeek V4 - Frontier model optimized for Ascend hardware