Helios is a 14-billion-parameter open-source video generation model from Peking University and ByteDance, released on March 4, 2026. It runs at 19.5 frames per second on a single NVIDIA H100 GPU - matching the throughput of 1.3B distilled models while delivering the quality you'd expect from a full-scale 14B architecture.

TL;DR

14B open-source video model running at 19.5 FPS on one H100 - same speed as models one-tenth its size
Supports text-to-video, image-to-video, and video-to-video; creates clips up to 60 seconds; Apache 2.0 licensed
Beats every open-weight distilled competitor on HeliosBench short and long video, with no KV-cache or quantization tricks

That speed number sounds like a marketing claim, so let me be clear about how it's reached. Helios uses aggressive token compression (8x history reduction via Multi-Term Memory Patchification), a Pyramid Unified Predictor Corrector that cuts generation tokens by another 2.3x, and adversarial hierarchical distillation to reduce sampling steps from 50 to 3 in the Distilled variant. No KV-cache. No quantization. No sparse attention. The efficiency gains come from architectural compression, not the standard shortcuts.

The model is built on top of Wan-2.1-T2V-14B, adapted into an autoregressive generator via Unified History Injection - a technique that feeds historical clean frames alongside the noisy segment being denoised, allowing the bidirectional base to function autoregressively.

Key Specifications

Specification	Details
Provider	Peking University / ByteDance / Canva / Chengdu Anu Intelligence
Model Family	Helios
Parameters	14B (active for all inference)
Max Video Length	1452 frames (~60 seconds at 24 FPS)
Resolution	384 x 640 pixels
Training Data	800,000 video clips (under 10 seconds each)
Supported Tasks	Text-to-Video, Image-to-Video, Video-to-Video
License	Apache 2.0
Release Date	March 4, 2026
Pricing	Free (open weights)
Min VRAM (low-mode)	~6 GB (Group Offloading enabled)
Recommended HW	Single NVIDIA H100

Three variants are available: Helios-Base (50 sampling steps, maximum quality), Helios-Mid (intermediate, uses CFG-Zero for ~2x speedup), and Helios-Distilled (3 sampling steps, real-time via adversarial distillation). All three are open on HuggingFace.

Benchmark Performance

Helios was assessed on HeliosBench, a custom test dataset of 240 prompts built by the authors because no existing public benchmark covers real-time long-video generation. That's worth flagging: these are author-run benchmarks, not independent third-party evaluations. Take the numbers at face value for now.

Model	Parameters	FPS (H100)	Short Video Score (81f)	Long Video Score (1440f)
Helios-Distilled	14B	19.53	6.00	6.94
Reward Forcing	1.3B	22.13	5.71	6.88
SANA Video Long	2B	13.24	5.65	6.41
Krea-RealTime-14B	14B	6.70	5.88	6.52
Wan 2.1 14B (base)	14B	<1	6.02	4.90

The most interesting comparison is against Wan 2.1 14B, the base model Helios was initialized from. Helios-Distilled runs 52x faster (19.53 vs <1 FPS) and scores higher on long-video generation (6.94 vs 4.90), which is the task where temporal drift matters most. On short video, the base Wan model edges ahead by 0.02 points - essentially tied.

Against Reward Forcing (1.3B), which is the fastest open model at 22.13 FPS, Helios trades 2.6 FPS for substantially better long-video quality. The 200-person user study confirms this: Helios wins 70-92.5% of pairwise comparisons on long video and 56-99.2% on short video across tested prompts.

End-to-end throughput comparison across open video models on a single H100 GPU End-to-end throughput (FPS) on a single NVIDIA H100 across all official model variants. Helios-Distilled reaches 19.53 FPS despite having 14B parameters. Source: neurohive.io

Key Capabilities

Real-Time Long-Video Generation

The headline capability is producing videos longer than 30 seconds at near-real-time speeds on a single consumer-grade data center GPU. Prior 14B models like Wan 2.1 took minutes per second of footage. Helios-Distilled collapses that to real-time, which opens up interactive applications: live storyboarding tools, game engine integration, on-device content creation.

The autoregressive chunking approach processes 33 frames per chunk (though num_frames values should be multiples of 33). For very long videos (1440+ frames), this means hundreds of sequential chunks - which is exactly where most autoregressive video models fail due to temporal drift. Helios addresses this through three explicit anti-drifting techniques baked into training: Relative RoPE (fixed positional encoding regardless of video length), First-Frame Anchor (the opening frame is maintained throughout), and Frame-Aware Corruption (training on deliberately degraded history contexts to build robustness).

Unified Multi-Task Support

A single checkpoint handles T2V, I2V, and V2V tasks through a unified input representation. This matters for production workflows where you want one model rather than three separate deployments. The approach is described in the technical report as "Unified History Injection" - historical frames (clean or noisy) are fed with the current segment, so the model sees both conditioning context and the generation target in a single forward pass.

Low-VRAM Inference

With Group Offloading enabled, Helios runs on about 6 GB of VRAM. For a 14B model, that's striking. The four-14B-models-per-80GB-H100 training configuration uses the same compression ideas applied at inference time. Most 14B video models require at least 24 GB, often 40-80 GB for the full pipeline.

Benchmark performance of Helios against competing video generation models HeliosBench scores across four video length regimes: 81, 240, 720, and 1440 frames. Helios-Distilled leads across all regimes. Source: neurohive.io

Pricing and Availability

Helios is fully open-source under Apache 2.0, which permits commercial use, modification, and redistribution. Weights for all three variants (Base, Mid, Distilled) are hosted on HuggingFace under BestWishYsh/Helios-Base and BestWishYsh/Helios-Distilled. ModelScope mirrors are also available.

There's no hosted API or SaaS pricing from the authors. Running Helios costs whatever your inference compute costs. On a cloud H100 (roughly $2.50-3.50/hour), you can create 19.5 frames per second, which means roughly 24 seconds of video per minute of compute time for the Distilled variant.

The project has day-0 Diffusers integration, with support also available through vLLM-Omni and SGLang. ComfyUI integration exists via a community wrapper. The GitHub repository includes training scripts, inference code, and the HeliosBench evaluation framework.

The authors note this is a research release, not a product integration with ByteDance's commercial services. That matters for anyone wondering whether Xiaomi-style surprise deployments (like the Hunter Alpha / MiMo-V2-Pro stealth launch) are coming for Helios. The research-only framing suggests not.

Comparison to Competitors

For local open-source video generation, Helios sits above LTX-2.3 on long-video quality at the cost of lower maximum resolution (384x640 vs LTX-2.3's 4K). LTX-2.3 claims ~18x faster generation than Wan 2.2 on similar hardware, but that comparison is vendor-benchmarked against a different baseline. For Kling 3 and Seedance 2, both are closed-source commercial models with notably higher output quality and resolution but no local deployment option.

If you're building something that needs open-weight video generation and can live with 384x640, Helios-Distilled is the current best option for throughput. If quality at higher resolution matters more than speed, the base Wan 2.1 or LTX-2.3 family are better picks. See the AI Image Generation Leaderboard for broader comparisons across the generation ecosystem.

Strengths

Highest FPS among 14B open-weight video models by a wide margin
Competitive quality at 1/10th the compute of full-scale 14B baselines
Apache 2.0 license - no restrictions on commercial use
Diffusers integration means standard tooling works out of the box
~6 GB VRAM mode opens deployment on consumer hardware
All three variants released with training code and eval framework

Weaknesses

384x640 maximum resolution is well below commercial competitors and LTX-2.3
HeliosBench is author-developed, not an independent benchmark
Occasional flickering at segment boundaries (acknowledged in the paper)
No hosted API - requires own compute
Training on 800K short clips may limit diversity of produced content

Our review of LTX-2.3 - the main open-source alternative with higher resolution
Our review of Kling 3 - leading closed-source commercial video model
Our review of Seedance 2 - ByteDance's closed-source video product
AI Image Generation Leaderboard for broader model rankings
Best Local Image Generation Models 2026 - guide on running generation models locally

FAQ

What is Helios best at?

Generating videos longer than 30 seconds at near-real-time speeds on a single GPU. It's the fastest open 14B video model by FPS and leads on long-video quality benchmarks.

Can Helios run on consumer hardware?

Yes, with Group Offloading enabled it needs approximately 6 GB of VRAM, making it compatible with RTX 3060/4060-class cards. Full-quality mode requires H100-level VRAM.

Is Helios free for commercial use?

Yes. The Apache 2.0 license permits commercial use, modification, and redistribution without restriction.

How does Helios compare to LTX-2.3?

Helios is faster and better at long-video temporal coherence; LTX-2.3 supports higher resolutions (up to 4K) and adds synchronized audio generation. Choose based on your resolution and audio requirements.

What resolution does Helios output?

384x640 pixels. This is the main limitation compared to commercial models, which normally output 720p or higher.

Sources: