Claude Hits Azure GA on NVIDIA's Blackwell Ultra Hardware
Anthropic's Claude models are now generally available in Microsoft Foundry on Azure, running on NVIDIA GB300 Blackwell Ultra NVL72 racks - with vendor numbers claiming 40% faster token generation than H100.

Anthropic's Claude models reached general availability in Microsoft Foundry on Azure on June 29, running on NVIDIA GB300 Blackwell Ultra NVL72 racks. The launch marks the first time enterprise teams can access Claude through their Azure billing account with full Azure-native identity, networking, and governance - no separate Anthropic subscription required.
The headline number: Microsoft says Claude Sonnet on GB300 produces tokens 40% faster than H100 nodes and roughly 15% faster than B200 systems. At 1-million-token context lengths, Azure reports inference latency drops up to 6x versus the previous H100 deployment. Those figures are vendor-measured in Microsoft's own environment, not independently validated, and they should be treated accordingly until customers can publish their own runs.
Key Specs
| Component | Detail |
|---|---|
| GPUs per rack | 72 Blackwell Ultra (GB300) |
| CPUs per rack | 36 NVIDIA Grace (Arm) |
| Fast memory | 37 TB total; 192 GB HBM3e per GPU |
| FP4 performance | 1,440 petaflops per rack |
| NVLink fabric | 1.8 TB/s per GPU |
| Interconnect | Quantum-X800 InfiniBand |
| vs H100 (vendor claim) | ~40% faster token generation |
| vs B200 (vendor claim) | ~15% faster token generation |
The Hardware
Memory and Compute
The GB300 NVL72 is a 48U rack that unifies 72 Blackwell Ultra GPUs and 36 Arm-based Grace CPUs into a single liquid-cooled unit. Each GPU carries 192 GB of HBM3e memory - double the H100's 80 GB - for 37 TB total fast memory per rack. That headroom is what makes very long context windows practical without aggressive KV-cache eviction.
FP4 throughput lands at 1,440 petaflops per rack. The cooling is hybrid: GPUs, CPUs, and NVSwitch are liquid-cooled, while OSFP modules and drives stay air-cooled. Power draw runs up to 140 kW per rack, which puts facility requirements firmly in the "serious data center" category.
Fabric
Every GPU in the rack connects over NVLink 5.0, which provides 1.8 TB/s per GPU and 130 TB/s of aggregate NVLink bandwidth across all 72 chips. External connectivity runs over Quantum-X800 InfiniBand. NVIDIA quotes a 50x overall throughput improvement compared with Hopper-generation AI factories, a number that scales from entire-factory comparisons rather than single-model inference jobs.
The GB300 NVL72 integrates 72 Blackwell Ultra GPUs and 36 Grace CPUs into a single 48U liquid-cooled rack.
Source: press.asus.com
The Performance Numbers
Microsoft's published comparisons put Claude Sonnet on GB300 ahead of both of its predecessor generations on raw throughput and latency:
| Metric | H100 | B200 | GB300 |
|---|---|---|---|
| Token generation speed | Baseline | +~15% | +~40% |
| Long-context latency (1M tokens) | Baseline | - | Up to 6x lower |
| Memory per GPU | 80 GB HBM3e | 192 GB HBM3e | 192 GB HBM3e |
The 40% throughput gain over H100 is plausible given the architectural jump - GB300 physically has more memory bandwidth and faster tensor cores. The 6x latency claim for million-token context is harder to assess without knowing batch size and prompt length distributions. Those figures matter enormously for long-context inference jobs, where memory bandwidth, not compute, is usually the bottleneck.
Context matters: these are single-vendor benchmarks measured on Azure's infrastructure at Microsoft's discretion. They match what a reasonable architectural analysis would predict, but independent customer validation hasn't surfaced publicly yet.
Getting Started
Accessing Claude through Foundry uses the AnthropicFoundry client from Anthropic's SDK. Microsoft Entra ID is the recommended auth method for production; API keys work for everything except Mythos 5 and Mythos Preview.
from anthropic import AnthropicFoundry
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
base_url = "https://<resource-name>.services.ai.azure.com/anthropic"
deployment_name = "claude-sonnet-5"
token_provider = get_bearer_token_provider(
DefaultAzureCredential(),
"https://ai.cognitiveservices.com/.default"
)
client = AnthropicFoundry(
azure_ad_token_provider=token_provider,
base_url=base_url
)
message = client.messages.create(
model=deployment_name,
messages=[{"role": "user", "content": "Your prompt here"}],
max_tokens=2048,
)
print(message.content)
The base URL format is fixed: https://<resource-name>.services.ai.azure.com/anthropic. The deployment name you chose during provisioning routes the request to a specific model version.
Available models and deployment options as of GA:
| Model | Hosted on Azure | Auth | Region Scope |
|---|---|---|---|
| Claude Opus 4.8 | Yes | Entra ID or API key | Global / Data Zone (US) |
| Claude Sonnet 4.6 | Yes | Entra ID or API key | Global Standard |
| Claude Sonnet 5 | Yes | Entra ID or API key | Global Standard |
| Claude Haiku 4.5 | Yes | Entra ID or API key | Global Standard |
| Claude Mythos 5 | Anthropic-hosted only | Entra ID only | Global Standard |
Global Standard deployments land in East US2 or Sweden Central. Claude Opus 4.8 on Azure also supports Data Zone Standard (US) for US data-residency requirements.
Enterprise Claude workloads in Azure run on Anthropic's own GB300 NVL72 racks rather than shared multi-tenant GPU pools.
Source: pexels.com
What Didn't Move
The Azure path isn't a full replacement for Anthropic's direct API. Several features either don't transfer or behave differently:
Data residency caveats exist. Prompts and completions stay within Azure for the "Hosted on Azure" deployment path. However, Microsoft explicitly notes that Excel Agent Mode and Copilot Researcher - two Claude integrations available in Microsoft 365 - run on Anthropic-managed infrastructure outside Azure's data-residency commitment. If your compliance requirement is "Claude traffic stays in Azure," check which specific integration you're using.
Pricing isn't public yet for the GB300 tier. Anthropic's standard API pricing ($2/$10 per million tokens for Sonnet 5 through August 31) is the reference point, but Azure Marketplace billing adds a layer. The faster throughput on GB300 doesn't automatically mean lower cost per token - you're paying for a different infrastructure tier.
Mythos 5 can't use API keys. Claude Mythos 5 and Mythos Preview support Entra ID authentication only. If your pipeline relies on static API key rotation, you need to redesign the auth flow before using those models on Azure.
Where It Falls Short
The 40% H100 speedup claim sounds clean, but the conditions matter. That figure likely reflects single-request throughput rather than real-world batch serving patterns, where memory bandwidth contention and scheduling decisions affect actual throughput. It also doesn't say what model size and context length the benchmark ran at.
Microsoft and Anthropic report early pilots where Claude+Phi-4 multi-model chains reached 30% accuracy improvements in enterprise workflows. One case study from a nuclear safety firm cites reducing a 200-day human review process to one day using Claude on Foundry. These are compelling anecdotes, but they're cherry-picked success stories from customers motivated to praise the platform. Representative average-case performance data isn't public.
The region limitation also matters. Global Standard deployments are confined to East US2 and Sweden Central as of GA, with more regions scheduled for Q3 2026. Enterprise customers with latency-sensitive workloads in other regions need to wait.
NVIDIA's Justin Boitano described Claude on GB300 as targeting "complex technical work" requiring "strong reasoning and coding capabilities." That framing is accurate - this isn't about cheaper chat; it's about enabling the long-context agent workflows that previously choked on H100 memory limits. Whether the raw performance claims hold under production conditions is the question that only customer benchmark data can answer.
Sources:
- Claude in Microsoft Foundry: GA Announcement - Anthropic, June 29, 2026
- Claude Meets Blackwell Ultra: NVIDIA Blog - NVIDIA, July 2026
- Deploy and use Claude models in Microsoft Foundry - Microsoft Learn
- Claude in Microsoft Foundry is Now Generally Available - Microsoft Azure Blog, June 29, 2026
- NVIDIA GB300 NVL72 Product Page - NVIDIA
