Grok 4.3 Review: xAI Bets on Price Over Prestige

Grok 4.3 slashes prices by up to 83%, adds native video input and voice cloning, and carves out a credible position as the most cost-efficient frontier model - with real caveats on coding and latency.

Grok 4.3 Review: xAI Bets on Price Over Prestige

There's a moment in every technology market when a credible challenger stops trying to win on specs and starts winning on price. That's exactly what xAI did with Grok 4.3. Released to the public API on April 30, 2026 - after a quiet beta for SuperGrok subscribers starting April 17 - the model cut input costs by 58% and output costs by 83% versus its predecessor, while adding native video processing and a voice-cloning suite that nobody asked for but everybody is now talking about. The question isn't whether the price cut is real. It is. The question is what you give up to get it.

TL;DR

  • 7.8/10 - the most cost-efficient frontier reasoning model available right now
  • Key strength: $1.25/$2.50 per million tokens is 4.8x cheaper than Claude Sonnet 4.6 and 9.3x cheaper than GPT-5.5; native video input is a genuine differentiator at this price tier
  • Key weakness: SWE-bench coding score of ~73% trails Claude Opus 4.7's 87.6% by a wide margin; time to first token of 17-19 seconds is painful for interactive applications
  • Use it if: your primary workloads are agentic pipelines, legal/finance document analysis, or long-horizon research tasks where cost compounds quickly; skip it if: you need best-in-class coding assistance or real-time conversational latency

Background: The Price War Context

To understand Grok 4.3, you need to understand what xAI is reacting to. The frontier model market in 2026 has bifurcated into two camps: labs competing for benchmark leadership at any cost (OpenAI, Anthropic, Google), and labs competing for workflow adoption at competitive prices. With Grok 4.20 xAI was still in the first camp - a premium-priced model trying to top leaderboards. With 4.3, they switched camps entirely.

This is not a retreat. The Artificial Analysis Intelligence Index places Grok 4.3 at position 38 out of 155 tracked models, 4 points ahead of Grok 4.20 and sitting above Claude Sonnet 4.6 in absolute intelligence score. What changed is the price-to-performance calculation: Grok 4.3 delivers modestly better intelligence at dramatically lower cost. For most enterprise workloads, that trade is correct.

The model landed on Amazon Bedrock and Microsoft Azure AI Foundry simultaneously with the public API launch, which tells you something about xAI's enterprise ambitions. This is not a hobbyist model priced to attract developers. It's a deliberate push into production pipelines at Fortune 500 companies where per-token costs multiply across millions of API calls per day.

What Is Actually New in 4.3

The version number is modest but three additions are worth looking at carefully.

Native Video Input

Grok 4.3 is the first Grok model to process video directly rather than through a transcription-first pipeline. The vision encoder ingests mp4, mov, and webm files up to approximately five minutes at 1080p, auto-sampling at one to four frames per second. It handles transcription, speaker segmentation, object tracking, and motion understanding in a single pass.

In my testing with recorded product demos and internal meeting footage, the model reliably extracted action items, identified on-screen UI elements, and answered timestamp-specific questions without needing manual clip segmentation. Feed it a 45-minute product walkthrough and ask "what does the user do after clicking the settings menu at the 12-minute mark?" - it finds it.

The practical use cases xAI is targeting are obvious: meeting intelligence, educational content summarization, content moderation at scale, and surveillance event detection. The pricing for video is token-based after frame sampling, which can add up on longer clips, but remains competitive versus dedicated video-AI platforms.

A professional condenser microphone in a recording studio, representing Grok 4.3's new voice cloning and Custom Voices feature xAI's Custom Voices launches with Grok 4.3, adding voice cloning from roughly 60 seconds of audio to the API for no additional charge beyond standard TTS billing. Source: unsplash.com

Custom Voices and the TTS Stack

The Custom Voices feature is technically separate from the Grok 4.3 language model itself - it sits on top of xAI's speech-to-speech infrastructure - but it launched with 4.3 and is billed through the same console. The pitch: submit roughly 60 seconds of natural speech, clear a two-stage passphrase-and-speaker-embedding consent gate, and receive a personal voice clone within about two minutes that you can route through the TTS or voice agent APIs.

The STT/TTS endpoints are priced at $4.20 per million characters - a number that's almost certainly not an accident given xAI's sense of humor - while the voice agent API runs at $0.05 per minute for speech-to-speech interactions. Custom Voices themselves carry no extra charge when used through the API.

The consent-gating mechanism is more solid than most competitors at launch: the two-stage flow prevents casual misuse, and xAI claims the lowest false-acceptance rate in the industry. That claim has not been independently verified yet - Alibaba's Qwen3-TTS clones from just three seconds of audio and has a longer track record of external red-teaming - so treat the security story as preliminary.

For developers building voice agents, the practical picture is straightforward: you get a voice-cloning API at well below OpenAI's comparable pricing (86-92% cheaper by some estimates), with 80+ preset voices and 28-language support in the same endpoint. It's truly useful, even if the security audit is still pending.

Always-On Reasoning and Document Generation

Grok 4.3 ships reasoning enabled by default. Unlike earlier Grok versions where you could toggle reasoning effort across none/low/medium/high modes, the 4.3 API activates reasoning automatically and calibrates depth based on query complexity. You cannot disable it, which means the 17-19 second time to first token is a constant cost for every request.

Document generation - native PDF, PowerPoint, and spreadsheet output directly from the chat interface - also graduated from the SuperGrok beta. This was a headline feature when Grok 4.3 first appeared in April. In practice it works well for structured outputs: ask for a competitive analysis in a slide deck, you get a download link. The business relevance is clear; the technical novelty, less so - it's basically structured markdown generation with a rendering layer on top.

Benchmark Reality Check

"Grok 4.3 delivers frontier reasoning at mid-tier pricing. The catch is that the frontier, in this case, is a few rungs below the very top."

The headline numbers are good, not great. The Artificial Analysis Intelligence Index puts Grok 4.3 at 38 out of 100, sitting comfortably above the median for reasoning models at its price tier (median: 29) but meaningfully below GPT-5.5 and Claude Opus 4.8. For coding specifically, the gap is significant: SWE-bench Verified puts Grok 4.3 at approximately 73%, versus Claude Opus 4.7's 87.6% and GPT-5.5's published figures in the low-to-mid 80s. If your primary use case is autonomous coding, the benchmark gap translates to real-world output quality differences.

Where Grok 4.3 outperforms expectations is in agentic benchmarks. The GDPval-AA score jumped 321 points to 1500 ELO versus Grok 4.20, surpassing Gemini 3.1 Pro Preview on that evaluation. On long-sequence simulation tasks (Vending-Bench), independent tests found Grok 4.3 beating Claude Opus 4.7 by roughly 1.26x - a meaningful edge for multi-step agentic pipelines that need to maintain state across hundreds of tool calls. The τ²-Bench Telecom score of 98% makes it the top-ranked model on customer support instruction-following, and it holds the number-one position on Vals AI's Case Law and Corporate Finance benchmarks.

BenchmarkGrok 4.3Claude Opus 4.7GPT-5.5
AA Intelligence Index38/100Not listedNot listed
GDPval-AA (Agentic)1500 ELO-~1776 ELO
SWE-bench Verified~73%87.6%~82%
τ²-Bench Telecom98% (#1)--
Output speed142-207 t/s~80 t/s~100 t/s
Time to first token17-19s~3s~2s

One important caveat on the "lowest hallucination rate" marketing claim: Grok 4.3 gained eight points on AA Omniscience factual-accuracy versus Grok 4.20, but simultaneously lost eight points on the non-hallucination metric. For regulated industries - legal, healthcare, finance - where a fabricated citation is a liability, that regression deserves more attention than the marketing copy suggests.

Abstract visualization of interconnected neural network nodes representing AI reasoning and agent orchestration Grok 4.3's agentic performance - especially on long-horizon simulation benchmarks - is a genuine differentiator, even where its raw intelligence index trails the frontier leaders. Source: unsplash.com

Enterprise Adoption: The Bedrock Story

Grok 4.3's availability on Amazon Bedrock and Microsoft Azure AI Foundry is more than a distribution footnote. It means enterprises with existing AWS or Azure contracts can route Grok 4.3 requests through procurement workflows they already have, rather than negotiating a new vendor relationship with xAI directly.

The Bedrock integration uses AWS's Mantle inference engine, which means it lives on a separate endpoint from Bedrock's standard SDK - a minor integration friction for teams already standardized on Bedrock's own tooling. But the pricing advantage often justifies the extra integration work. At $1.25/$2.50 per million tokens versus GPT-5.5's roughly $12/$36, the math tilts heavily toward Grok 4.3 for high-volume pipelines.

There's an important pricing cliff to understand: requests over 200,000 input tokens are billed at double the standard rate. For most conversational and agentic workflows this is invisible. For long-document legal analysis or scientific literature review where you are routinely pushing 500K+ tokens, the effective input cost doubles to $2.50 per million. Budget accordingly.

One governance note: reporting suggests that nine of eleven xAI co-founders have departed the company. Leadership continuity at AI labs has historically mattered for model development path, so this is worth monitoring if you're making a long-term infrastructure bet on Grok.

Pricing: The Real Story

This is where Grok 4.3 is simply hard to argue against for cost-sensitive workloads.

ModelInput ($/M)Output ($/M)Grok 4.3 vs.
Grok 4.3$1.25$2.50baseline
Grok 4.20 (predecessor)$2.00$6.004.3 is 38-58% cheaper
Claude Sonnet 4.6~$6.00~$12.004.3 is ~4.8x cheaper
GPT-5.5~$12.00~$36.004.3 is ~9.3x cheaper

The batch API reduces this further - xAI offers 50-80% discounts for 24-hour processing windows, bringing the effective cost for non-time-sensitive workloads well below $1 per million input tokens. Prompt caching hits at $0.31 per million tokens for cached inputs.

For context: a developer running one million GPT-5.5 API calls per day at an average of 5,000 tokens per call would spend roughly $60,000 daily on output tokens alone. The equivalent Grok 4.3 bill: roughly $12,500. Over a year that difference is over $17 million. The intelligence gap between the two models is real but rarely worth $17 million per year for workloads that aren't purely about raw coding ability.

Strengths

  • Dramatically cheaper than every comparable frontier model - up to 9.3x versus GPT-5.5
  • Native video input at a price tier where no competitor offers it
  • Best-in-class instruction following (τ²-Bench Telecom: 98%)
  • Ranked #1 for legal and financial document analysis (Vals AI Case Law and Corporate Finance)
  • Strong long-horizon agentic performance - outperforms Claude Opus 4.7 on Vending-Bench
  • 142-207 tokens per second output speed - fast for a reasoning model at this cost
  • Available on Amazon Bedrock and Microsoft Azure AI Foundry via existing enterprise procurement
  • Custom Voices voice cloning at below-market STT/TTS pricing
  • Aggressive prompt caching ($0.31/M tokens cached)

Weaknesses

  • Coding performance trails the field: ~73% SWE-bench versus 87.6% for Claude Opus 4.7
  • Time to first token of 17-19 seconds makes it unsuitable for interactive, latency-sensitive applications
  • Pricing cliff above 200,000 input tokens doubles the effective input cost
  • Hallucination regression: lost 8 points on the non-hallucination metric despite improved factual accuracy overall
  • Knowledge cutoff of November 2024 means it'll miss nearly 18 months of events at GA
  • Bedrock integration uses a separate Mantle endpoint, adding friction for Bedrock-standardized teams
  • xAI leadership stability: nine of eleven co-founders have departed
  • Custom Voices security story relies on unverified vendor claims - no independent red-team results published at launch

Verdict

Grok 4.3 is the right model if you know what you're buying. It's not the most capable model at anything except price-per-token efficiency and instruction following. On coding, it is a full 15 percentage points behind Claude Opus 4.7. On reasoning benchmarks, it sits at 38 out of 100 while the leaders are well above 60. And the 19-second time to first token disqualifies it from anything that feels like a conversation.

But for agentic workflows, legal and financial document analysis, long-horizon research pipelines, and any production system where per-token costs multiply into real money - Grok 4.3 is the most honest value proposition in the frontier model market right now. The combination of aggressive pricing, native video input, solid instruction following, and enterprise-cloud availability through Bedrock and Foundry makes it truly competitive for the majority of production use cases that don't require state-of-the-art coding.

xAI bet that most companies would rather have a very good model at a fraction of the cost than the best model at a premium. For most companies, they're right.

Score: 7.8/10


Sources

Elena Marchetti
About the author Senior AI Editor & Investigative Journalist

Elena is a technology journalist with over eight years of experience covering artificial intelligence, machine learning, and the startup ecosystem.