Models

NVIDIA SANA-WM

NVIDIA SANA-WM

NVIDIA's SANA-WM is a 2.6B-parameter hybrid linear diffusion transformer that generates 60-second 720p video with 6-DoF camera control on a single H100, built for embodied AI and robotics simulation.

Ministral 3B

Ministral 3B

Mistral AI's smallest open-weight model - 3B parameters, 256K context, Apache 2.0 license, built for edge and cost-sensitive deployments.

Qwen3-Coder-Next

Qwen3-Coder-Next

Qwen3-Coder-Next is an 80B MoE coding model from Alibaba that activates just 3B parameters per forward pass, scoring over 70% on SWE-Bench Verified with agent scaffolding under Apache 2.0.

Gemini 3.5 Flash

Gemini 3.5 Flash

Google DeepMind's fastest frontier model, hitting 76.2% on Terminal-Bench 2.1 and 289 tok/s, now powering AI Mode in Search for over 1 billion monthly users.

SU-01

SU-01

SU-01 is a 30B-A3B MoE reasoning model from Shanghai AI Lab that achieves gold-medal performance on IMO 2025, USAMO 2026, and IPhO 2024/2025 using a three-stage training recipe and test-time scaling.

HiDream-O1-Image

HiDream-O1-Image

HiDream-O1-Image is an 8B open-source text-to-image model with a pixel-space diffusion architecture that outperforms 32B FLUX.2 [dev] across five major benchmarks.

SubQ

SubQ

SubQ is the first LLM built on a fully subquadratic attention architecture, achieving a 12M-token research context and 52x faster inference than FlashAttention at 1M tokens.

ZAYA1-8B

ZAYA1-8B

Zyphra's ZAYA1-8B is an 8.4B-parameter MoE reasoning model with only 760M active parameters that matches DeepSeek-R1-0528 on math and coding benchmarks while running at a fraction of the compute cost.

GPT-Realtime-2

GPT-Realtime-2

OpenAI's second-generation real-time audio model with GPT-5-class reasoning, 128K context, five reasoning levels, and parallel tool calling - now generally available in the Realtime API.

GPT-5.5 Instant

GPT-5.5 Instant

OpenAI's new default ChatGPT model cuts hallucinations by 52.5% and adds Gmail-backed personalization while maintaining the low latency of its predecessor.

Nemotron 3 Nano Omni

Nemotron 3 Nano Omni

NVIDIA's first open omni-modal model: 30B total / 3B active hybrid Mamba-MoE that processes text, images, audio, and video in a single inference loop, with 9x higher throughput than comparable open omni models.

Mistral Medium 3.5

Mistral Medium 3.5

Mistral's first flagship merged model: a dense 128B with configurable reasoning, vision, and 77.6% SWE-Bench Verified, self-hostable on 4 GPUs.