Grok Imagine Video 1.5

xAI's Grok Imagine Video 1.5 is the #1-ranked image-to-video model on Artificial Analysis, generating 720p clips with native audio at $0.14/s - 86% cheaper than Sora 2 Pro.

Grok Imagine Video 1.5

Overview

Grok Imagine Video 1.5 is xAI's production image-to-video and text-to-video model, released as generally available on June 16-17, 2026. Built on Aurora, xAI's autoregressive video engine trained on 110,000 NVIDIA GB200 GPUs, it produces 720p clips up to 15 seconds with synchronized audio in a single inference pass. As of its GA launch, it holds the #1 spot on the Artificial Analysis Image-to-Video Arena leaderboard with an Elo of 1,473 - a 52-point jump over version 1.0, which Artificial Analysis noted as the largest single-version gain recorded on that board.

TL;DR

  • Best-in-class image-to-video: #1 on Artificial Analysis i2v Arena at Elo 1,473, beating Veo 3.1, Kling 3.0, and Seedance 2.0
  • 720p clips up to 15 seconds, native synchronized audio at no surcharge, generation in ~25 seconds via the Fast variant
  • API at $0.14/s for 720p ($4.20/min) - 86% cheaper than Sora 2 Pro, 65% cheaper than Veo 3.1

The model went from zero to first place in the i2v category in roughly ten months. xAI launched Grok Imagine Video in August 2025, updated it to version 1.0 in February 2026 (10-second clips, 720p, improved audio), and shipped 1.5 in late May-June 2026. In the 30 days following the 1.0 release, users generated 1.245 billion videos, giving xAI a large feedback pool to train the next iteration. Version 1.5 extends max clip length from 10 to 15 seconds, improves physics and motion coherence, and ships a Fast variant - a speed-optimized build now live on grok.com and the iOS/Android apps.

The model gained mainstream attention when Elon Musk posted an AI-made Iliad trailer built with Grok Imagine 1.5 on June 4, 2026. The 40-second clip built up 18.4 million views on X, generating the kind of viral visibility no benchmark can copy. That doesn't validate the model's quality metrics by itself, but the engagement showed the model's capability for cinematic composition.

  • August 2025 - Grok Imagine Video launches as xAI's first video generation product.

  • January 28, 2026 - Grok Imagine API launches: text-to-video, image-to-video, and video editing at $0.05/second.

  • February 2026 - Version 1.0 released. Clips extend to 10 seconds, 720p output, improved audio. 1.245 billion videos produced in the first 30 days.

  • May 31, 2026 - Version 1.5 enters preview via the xAI API. Elo 1,404 on day one, +52 over v1.0.

  • June 16-17, 2026 - Version 1.5 goes GA. Fast variant rolls out on grok.com and mobile apps.

Key Specifications

SpecificationDetails
ProviderxAI
Model FamilyGrok Imagine
EngineAurora (autoregressive, not diffusion-based)
ParametersNot disclosed
Max Clip Length15 seconds (video editing capped at 8.7s)
Frame Rate24 fps
Resolutions480p, 720p, 1080p (i2v only)
Aspect Ratios7 options: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3
AudioNative one-pass generation (dialogue, SFX, music)
Input Price (480p)$0.08/second
Input Price (720p)$0.14/second
Input Price (1080p)$0.25/second
API Model IDgrok-imagine-video-1.5
Release Date (GA)June 16, 2026
Open SourceNo
LicenseProprietary

Benchmark Performance

The clearest quality signal comes from the Artificial Analysis Video Arena, which collects blind pairwise votes from users comparing videos produced from the same prompt. As of June 24, 2026 (the image-to-video, with-audio leaderboard), rankings are:

ModelElo (i2v + audio)Provider
Dreamina Seedance 2.0 720p1,194ByteDance Seed
HappyHorse-1.11,121Alibaba-ATH
Grok Imagine Video 1.5 preview1,111xAI
Wan 2.71,090Alibaba
Veo 3.11,087Google
Grok Imagine Video 1.01,081xAI
Kling 3.0 1080p (Pro)1,072Kuaishou

Note: these are live scores and shift as more votes accumulate. The Elo of 1,473 cited in xAI's announcement and the buildfastwithai review reflects the Arena leaderboard at or shortly after GA launch, when vote counts were lower and confidence intervals wider. The current score of 1,111 in the Artificial Analysis with-audio board reflects a larger vote pool. The v1.5 model still sits #3 overall - above Veo 3.1 and Kling 3.0.

A separate data point: when the preview launched on May 31, the model entered the Arena at Elo 1,404 with a ±6 confidence interval - a genuine top-of-board position that held for several weeks before Seedance 2.0 and HappyHorse built up enough votes to push higher.

Grok Imagine Video 1.5 vs competitors comparison chart Feature and capability comparison: Grok Imagine Video 1.5 vs Kling 3.0, Sora 2 Pro, and Seedance 2.0. Source: buildfastwithai.com

Sora 2 is notably absent from the Artificial Analysis with-audio board. OpenAI discontinued the Sora consumer app on April 26, 2026, and the Sora 2 API remains on a deprecated track until September 24, 2026. That removes a key competitor from current comparison.

Key Capabilities

Aurora Engine: Autoregressive vs. Diffusion

Most video generation models use diffusion: they denoise all frames from noise simultaneously, then patch them together. Aurora produces each frame sequentially, conditioning each new frame on all prior ones. That sequential process is what produces stable camera movements and consistent subject positioning across a full 15-second clip - artifacts that show up quickly in diffusion models past the 5-second mark.

The architecture also processes audio tokens alongside video tokens in the same forward pass. Sound effects, dialogue, and ambient music sync with visual events because both modalities share latent representations during generation rather than being stitched together post-hoc. This one-pass approach is a real engineering distinction from competing models: Veo 3.1 and Kling 3.0 require separate audio pipeline steps (or charge extra for it). Grok Imagine Video 1.5 includes audio at no surcharge at every tier.

Generation Modes

The API supports five modes: text-to-video, image-to-video, reference-to-video (up to seven reference images), edit-video, and extend-video. The image-to-video mode is where the model performs best in blind evaluations. Text-to-video is available via the consumer interface on grok.com but not through the API - developers who need t2v programmatically will need to route to a different model.

The extend-video mode chains clips from the final frame, enabling sequences beyond 15 seconds. Face consistency degrades across multiple extensions, which is a known limitation documented by independent reviewers.

The Fast Variant

Video 1.5 Fast generates a 6-second 720p clip in roughly 25 seconds, down from 40+ seconds in version 1.0. It's the default in the consumer apps and designed for real-time workflows. The standard API variant takes longer but focuses on output consistency. Both use the same underlying Aurora engine; the Fast variant is a quality-speed tradeoff, not a separate model.

Grok Imagine Video 1.5 API pricing structure API per-second pricing across resolution tiers. Source: buildfastwithai.com

Pricing and Availability

At $4.20/minute for 720p, Grok Imagine Video 1.5 costs 86% less than Sora 2 Pro and 65% less than Veo 3.1, with audio included at every tier.

The API uses per-second billing with no minimum:

  • 480p: $0.08/second ($4.80/minute)
  • 720p: $0.14/second ($8.40/minute; xAI quotes $4.20/minute in marketing, which corresponds to a 30-second clip - verify against your actual usage pattern)
  • 1080p: $0.25/second ($15.00/minute); limited to image-to-video only

Audio is included at all tiers. Rate limit is 60 requests per minute. Available in us-east-1, eu-west-1, and us-west-2.

Consumer access comes in three tiers: a free account gets 5 credits per day on grok.com (accessible via the Imagine tab). SuperGrok Lite ($10/month) provides 480p up to 6-second clips. SuperGrok ($30/month) enables 720p and 15-second clips. X Premium+ ($40/month) includes platform-integrated access.

The API model ID is grok-imagine-video-1.5. An alias grok-imagine-video-1.5-2026-05-30 is also supported. Access via the xAI console at console.x.ai. The xAI Python SDK handles asynchronous polling automatically; REST API users need to poll every 5 seconds using the returned request_id.

For broader context on AI video generation costs, our video generation benchmarks leaderboard tracks pricing across the main providers.

Strengths and Weaknesses

Strengths

  • #1 or top-3 on Artificial Analysis i2v arena across most measurement windows
  • Native one-pass audio is the only major video API to include it at no surcharge
  • Aurora's autoregressive architecture produces stable motion across full 15-second clips without the typical diffusion-model jerkiness
  • 86% cheaper than Sora 2 Pro at 720p, with Sora 2 now on deprecated track
  • 1080p available for image-to-video (listed in pricing, though not emphasized in consumer docs)
  • Strong video extension capability for sequences beyond 15 seconds
  • Available on iOS, Android, and web in addition to API

Weaknesses

  • 720p ceiling for standard use - Kling 3.0 and Veo 3.1 both reach 1080p in their core offerings
  • No text-to-video via API - only through the consumer interface
  • Face consistency degrades across multiple video extension passes (normally after 5-6 chains)
  • 24 fps only - no 60 fps for gaming, sports, or smooth product demos
  • Aurora creates frames sequentially, so prompt structure matters more than in diffusion models: actions described early appear early, buried actions may be skipped
  • Free tier (5 credits/day) is restrictive for meaningful testing

Sources:

✓ Last verified June 24, 2026

James Kowalski
About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.