Grok Imagine Video 1.5
xAI's Grok Imagine Video 1.5 is the #1-ranked image-to-video model on Artificial Analysis, generating 720p clips with native audio at $0.14/s - 86% cheaper than Sora 2 Pro.

Overview
Grok Imagine Video 1.5 is xAI's production image-to-video and text-to-video model, released as generally available on June 16-17, 2026. Built on Aurora, xAI's autoregressive video engine trained on 110,000 NVIDIA GB200 GPUs, it produces 720p clips up to 15 seconds with synchronized audio in a single inference pass. As of its GA launch, it holds the #1 spot on the Artificial Analysis Image-to-Video Arena leaderboard with an Elo of 1,473 - a 52-point jump over version 1.0, which Artificial Analysis noted as the largest single-version gain recorded on that board.
TL;DR
- Best-in-class image-to-video: #1 on Artificial Analysis i2v Arena at Elo 1,473, beating Veo 3.1, Kling 3.0, and Seedance 2.0
- 720p clips up to 15 seconds, native synchronized audio at no surcharge, generation in ~25 seconds via the Fast variant
- API at $0.14/s for 720p ($4.20/min) - 86% cheaper than Sora 2 Pro, 65% cheaper than Veo 3.1
The model went from zero to first place in the i2v category in roughly ten months. xAI launched Grok Imagine Video in August 2025, updated it to version 1.0 in February 2026 (10-second clips, 720p, improved audio), and shipped 1.5 in late May-June 2026. In the 30 days following the 1.0 release, users generated 1.245 billion videos, giving xAI a large feedback pool to train the next iteration. Version 1.5 extends max clip length from 10 to 15 seconds, improves physics and motion coherence, and ships a Fast variant - a speed-optimized build now live on grok.com and the iOS/Android apps.
The model gained mainstream attention when Elon Musk posted an AI-made Iliad trailer built with Grok Imagine 1.5 on June 4, 2026. The 40-second clip built up 18.4 million views on X, generating the kind of viral visibility no benchmark can copy. That doesn't validate the model's quality metrics by itself, but the engagement showed the model's capability for cinematic composition.
August 2025 - Grok Imagine Video launches as xAI's first video generation product.
January 28, 2026 - Grok Imagine API launches: text-to-video, image-to-video, and video editing at $0.05/second.
February 2026 - Version 1.0 released. Clips extend to 10 seconds, 720p output, improved audio. 1.245 billion videos produced in the first 30 days.
May 31, 2026 - Version 1.5 enters preview via the xAI API. Elo 1,404 on day one, +52 over v1.0.
June 16-17, 2026 - Version 1.5 goes GA. Fast variant rolls out on grok.com and mobile apps.
Key Specifications
| Specification | Details |
|---|---|
| Provider | xAI |
| Model Family | Grok Imagine |
| Engine | Aurora (autoregressive, not diffusion-based) |
| Parameters | Not disclosed |
| Max Clip Length | 15 seconds (video editing capped at 8.7s) |
| Frame Rate | 24 fps |
| Resolutions | 480p, 720p, 1080p (i2v only) |
| Aspect Ratios | 7 options: 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 |
| Audio | Native one-pass generation (dialogue, SFX, music) |
| Input Price (480p) | $0.08/second |
| Input Price (720p) | $0.14/second |
| Input Price (1080p) | $0.25/second |
| API Model ID | grok-imagine-video-1.5 |
| Release Date (GA) | June 16, 2026 |
| Open Source | No |
| License | Proprietary |
Benchmark Performance
The clearest quality signal comes from the Artificial Analysis Video Arena, which collects blind pairwise votes from users comparing videos produced from the same prompt. As of June 24, 2026 (the image-to-video, with-audio leaderboard), rankings are:
| Model | Elo (i2v + audio) | Provider |
|---|---|---|
| Dreamina Seedance 2.0 720p | 1,194 | ByteDance Seed |
| HappyHorse-1.1 | 1,121 | Alibaba-ATH |
| Grok Imagine Video 1.5 preview | 1,111 | xAI |
| Wan 2.7 | 1,090 | Alibaba |
| Veo 3.1 | 1,087 | |
| Grok Imagine Video 1.0 | 1,081 | xAI |
| Kling 3.0 1080p (Pro) | 1,072 | Kuaishou |
Note: these are live scores and shift as more votes accumulate. The Elo of 1,473 cited in xAI's announcement and the buildfastwithai review reflects the Arena leaderboard at or shortly after GA launch, when vote counts were lower and confidence intervals wider. The current score of 1,111 in the Artificial Analysis with-audio board reflects a larger vote pool. The v1.5 model still sits #3 overall - above Veo 3.1 and Kling 3.0.
A separate data point: when the preview launched on May 31, the model entered the Arena at Elo 1,404 with a ±6 confidence interval - a genuine top-of-board position that held for several weeks before Seedance 2.0 and HappyHorse built up enough votes to push higher.
Feature and capability comparison: Grok Imagine Video 1.5 vs Kling 3.0, Sora 2 Pro, and Seedance 2.0.
Source: buildfastwithai.com
Sora 2 is notably absent from the Artificial Analysis with-audio board. OpenAI discontinued the Sora consumer app on April 26, 2026, and the Sora 2 API remains on a deprecated track until September 24, 2026. That removes a key competitor from current comparison.
Key Capabilities
Aurora Engine: Autoregressive vs. Diffusion
Most video generation models use diffusion: they denoise all frames from noise simultaneously, then patch them together. Aurora produces each frame sequentially, conditioning each new frame on all prior ones. That sequential process is what produces stable camera movements and consistent subject positioning across a full 15-second clip - artifacts that show up quickly in diffusion models past the 5-second mark.
The architecture also processes audio tokens alongside video tokens in the same forward pass. Sound effects, dialogue, and ambient music sync with visual events because both modalities share latent representations during generation rather than being stitched together post-hoc. This one-pass approach is a real engineering distinction from competing models: Veo 3.1 and Kling 3.0 require separate audio pipeline steps (or charge extra for it). Grok Imagine Video 1.5 includes audio at no surcharge at every tier.
Generation Modes
The API supports five modes: text-to-video, image-to-video, reference-to-video (up to seven reference images), edit-video, and extend-video. The image-to-video mode is where the model performs best in blind evaluations. Text-to-video is available via the consumer interface on grok.com but not through the API - developers who need t2v programmatically will need to route to a different model.
The extend-video mode chains clips from the final frame, enabling sequences beyond 15 seconds. Face consistency degrades across multiple extensions, which is a known limitation documented by independent reviewers.
The Fast Variant
Video 1.5 Fast generates a 6-second 720p clip in roughly 25 seconds, down from 40+ seconds in version 1.0. It's the default in the consumer apps and designed for real-time workflows. The standard API variant takes longer but focuses on output consistency. Both use the same underlying Aurora engine; the Fast variant is a quality-speed tradeoff, not a separate model.
API per-second pricing across resolution tiers.
Source: buildfastwithai.com
Pricing and Availability
At $4.20/minute for 720p, Grok Imagine Video 1.5 costs 86% less than Sora 2 Pro and 65% less than Veo 3.1, with audio included at every tier.
The API uses per-second billing with no minimum:
- 480p: $0.08/second ($4.80/minute)
- 720p: $0.14/second ($8.40/minute; xAI quotes $4.20/minute in marketing, which corresponds to a 30-second clip - verify against your actual usage pattern)
- 1080p: $0.25/second ($15.00/minute); limited to image-to-video only
Audio is included at all tiers. Rate limit is 60 requests per minute. Available in us-east-1, eu-west-1, and us-west-2.
Consumer access comes in three tiers: a free account gets 5 credits per day on grok.com (accessible via the Imagine tab). SuperGrok Lite ($10/month) provides 480p up to 6-second clips. SuperGrok ($30/month) enables 720p and 15-second clips. X Premium+ ($40/month) includes platform-integrated access.
The API model ID is grok-imagine-video-1.5. An alias grok-imagine-video-1.5-2026-05-30 is also supported. Access via the xAI console at console.x.ai. The xAI Python SDK handles asynchronous polling automatically; REST API users need to poll every 5 seconds using the returned request_id.
For broader context on AI video generation costs, our video generation benchmarks leaderboard tracks pricing across the main providers.
Strengths and Weaknesses
Strengths
- #1 or top-3 on Artificial Analysis i2v arena across most measurement windows
- Native one-pass audio is the only major video API to include it at no surcharge
- Aurora's autoregressive architecture produces stable motion across full 15-second clips without the typical diffusion-model jerkiness
- 86% cheaper than Sora 2 Pro at 720p, with Sora 2 now on deprecated track
- 1080p available for image-to-video (listed in pricing, though not emphasized in consumer docs)
- Strong video extension capability for sequences beyond 15 seconds
- Available on iOS, Android, and web in addition to API
Weaknesses
- 720p ceiling for standard use - Kling 3.0 and Veo 3.1 both reach 1080p in their core offerings
- No text-to-video via API - only through the consumer interface
- Face consistency degrades across multiple video extension passes (normally after 5-6 chains)
- 24 fps only - no 60 fps for gaming, sports, or smooth product demos
- Aurora creates frames sequentially, so prompt structure matters more than in diffusion models: actions described early appear early, buried actions may be skipped
- Free tier (5 credits/day) is restrictive for meaningful testing
Related Coverage
- Video Generation Benchmarks Leaderboard - full rankings across all major video AI models
- Veo 3.1 - Google DeepMind's competing model, top-ranked in several text-to-video categories
- Grok 4 - xAI's flagship language model
- AI Image Generation Leaderboard - for still image generation rankings
Sources:
- xAI: Grok Imagine Video 1.5 announcement
- Artificial Analysis Image-to-Video Leaderboard
- Artificial Analysis: grok-imagine-video model page
- xAI Docs: Video Generation
- xAI Docs: Grok Imagine Video 1.5 Preview model card
- Grok Imagine Video 1.5 - grok.ai.org arena analysis
- TechTimes: Grok Imagine Video 1.5 Goes Live - 86% Below Sora
- ExplainX: Grok Imagine Video 1.5 - xAI Launches #1 AI Video Generator
- The Decoder: xAI updates Grok Imagine to 1.5
- Build Fast With AI: Grok Imagine Video 1.5 Review 2026
- Storyboard18: Elon Musk posts AI-generated Iliad trailer
- xAI on X: Introducing Grok Imagine 1.0
✓ Last verified June 24, 2026
