Alibaba's HappyHorse Claims Video AI Crown - No Weights Yet

A 15B-parameter video model called HappyHorse-1.0 appeared anonymously on the Artificial Analysis Video Arena in early April 2026 and right away landed at the top of both the text-to-video and image-to-video rankings. Its promotional page described it as "fully open source with commercial licensing." The weights aren't published. The inference code is not published. The GitHub repository contains a README with the words "coming soon."

The team behind it has since been identified as researchers from Alibaba's newly formed Token Hub unit (ATH), led by Zhang Di - formerly vice president at Kuaishou and the architect of Kling AI. Zhang Di rejoined Alibaba in November 2025. The ATH division itself was only formed in March 2026 under CEO Eddie Wu, consolidating Alibaba's AI units - Tongyi Lab, Qwen, Wukong, and others - into a single group.

The Arena results are real. The open-source claim is not, at least not yet.

TL;DR

Claim: HappyHorse-1.0 is the #1 open-source video model, beating ByteDance Seedance 2.0 by 52 Elo points in image-to-video on Artificial Analysis Video Arena
Our take: The Arena rankings are independently verified through blind user preference tests - the lead is real. The "fully open source" label isn't. No weights, no code, no runnable API exist as of April 15, 2026.
Context: Built by Alibaba's Token Hub (ATH), formed March 2026. Led by the same engineer who built Kling AI at Kuaishou - a model HappyHorse now outranks by 115 Elo points.

What They Showed

HappyHorse-1.0 debuted on the Arena in two anonymous submissions - a V1 and a V2 variant - both listed without affiliation or paper. The model climbed to the top of both leaderboards within days of appearing.

HappyHorse-1.0 Elo score comparison chart against Seedance 2.0 and other video AI models HappyHorse-1.0 Elo scores versus competitors in text-to-video and image-to-video categories, as recorded by Artificial Analysis Video Arena. Source: help.apiyi.com

The Numbers

Current rankings from the Artificial Analysis Video Arena:

Model	I2V Elo (no audio)	T2V Elo (no audio)
HappyHorse-1.0	1,398	1,362
Dreamina Seedance 2.0	1,346	1,269
grok-imagine-video	1,327	-
SkyReels V4	1,288	-
Kling 3.0 1080p (Pro)	1,283	1,241
PixVerse V6	1,316	1,239

A 52-point gap over Seedance in image-to-video is substantial. Arena Elo scores are computed from millions of head-to-head blind comparisons where users vote without knowing which model produced which output. A 50-point difference is consistently perceivable, not noise.

The Architecture

HappyHorse-1.0 is a 40-layer unified single-stream Transformer. Unlike competitors that run separate models for video generation and audio synthesis, HappyHorse processes text, image, video, and audio tokens together in a single inference pass. That means native lip-sync rather than post-processing alignment - the audio is created with the video, not stapled on afterward.

The model uses 8-step DMD-2 (Distribution Matching Distillation v2) without classifier-free guidance, which compresses inference time substantially. Claimed throughput: a 5-second 1080p clip in approximately 38 seconds on a single H100 GPU. Native lip-sync across seven languages: Mandarin, Cantonese, English, Japanese, Korean, German, and French.

HappyHorse-1.0 architecture overview card showing model specs and arena rankings The HappyHorse-1.0 model card as it appeared during Arena submission - 40-layer architecture, 8-step denoising, T2V and I2V top-1 rankings. Source: help.apiyi.com

For a 15B model hitting these numbers, the compute efficiency story is worth taking seriously. Seedance 2.0 is from ByteDance - one of the most well-resourced video AI teams in the world, following months of pressure from Hollywood copyright disputes. Beating it by this margin, with a model this size, requires either a better architecture or a much better training dataset.

What We Tried

The model's official page at happyhorse.mobi lists four artifacts as released: base model weights, distilled model weights, super-resolution module, and inference code. Standard open-source package.

None of those exist publicly.

As of April 15, 2026:

The GitHub repository CalvintheBear/HappyHorse-1.0 exists with a README, no code, no weights, and "coming soon" placeholders throughout
HuggingFace shows no model card, no weight files, no license file under the happyhorse-ai namespace
The official website runs a functional demo - confirming the model works in production - but offers no download path
No public API tier is announced; Alibaba ATH confirmed "private internal testing" status

HappyHorse-1.0 mystery timeline - anonymous listing, both variants peak, then brief delisting from Arena The sequence of events: anonymous Arena submission, leaderboard peak, brief delisting, then re-listing after the Alibaba attribution became public knowledge. Source: help.apiyi.com

Reproducibility Table

What You Need	Current Status
Model weights	Not published
Distilled version	Not published
Super-resolution module	Not published
Inference code	Not published
API (public)	Not available
Minimum hardware (expected)	H100 or A100 with 48GB+ VRAM
License	Apache 2.0 (claimed but no license file exists)
Demo (web only)	Available at happyhorse.mobi

Calling something "open source" while publishing zero of the required artifacts is a marketing decision, not a technical one. WaveSpeedAI catalogued the same discrepancy shortly after the model appeared: "HappyHorse-1.0 currently fits the third category - demo access without downloadable materials."

The Gap

Two issues separate the claim from the current state.

"Open Source" Without Artifacts Isn't Open Source

Open source has a specific meaning in the developer community: published weights, published code, a real license. "Open weights" is a narrower claim, and "open access demo" is narrower still. HappyHorse is currently in the third category.

This matters practically. Without weights, developers can't run the model locally, can't fine-tune it, can't audit what the training data contained, and can't build production pipelines that aren't dependent on Alibaba's infrastructure timeline. The Apache 2.0 license claim is meaningless until there's something to apply it to.

Alibaba has a track record of eventually shipping open weights - the Qwen series has been truly open and has driven real adoption. But HappyHorse launched with the "open source" framing front and center, before it was true.

The Benchmark Has a Composition Problem

The Artificial Analysis Video Arena Elo ratings are computed from user preference votes. The composition of the test set matters. One independent analysis found that portrait generation and dubbing scenarios account for over 60% of Arena test cases - exactly the domain where HappyHorse's single-pass audio-video approach has the most obvious advantage.

For dynamic scenes, complex backgrounds, and professional video production content, the model's edge over Seedance 2.0 is less clear. That doesn't mean the ranking is fraudulent. But a model optimized for talking-head video will naturally score higher on an Arena with heavy talking-head representation.

The brief period where both HappyHorse V1 and V2 were delisted from the Arena before reappearing is worth noting. No official explanation was given. The unusual double-anonymous-submission pattern - two variants appearing simultaneously without affiliation - is not standard Arena procedure.

The Arena lead is real. Zhang Di knows how to build video models; his work on Kling at Kuaishou produced one of the most capable video generation systems available before he left. The unified Transformer architecture that produces audio and video in a single pass is technically interesting and, if the weights eventually ship, worth serious evaluation.

When they publish actual weights, this becomes a genuine open-source milestone. Until then, HappyHorse-1.0 is a demo from Alibaba's ATH unit with a leaderboard position and a GitHub repo that says "coming soon." The ATH team confirmed more releases are in the pipeline - for developers, the moment to pay attention is when artifacts land, not when press releases do.

Sources: