
DeepSeek V3.2
DeepSeek V3.2 is a 671B-parameter MoE model activating 37B per token that delivers frontier-class reasoning and coding at the lowest API price in the industry - $0.14/$0.28 input, $0.42 output per million tokens.

DeepSeek V3.2 is a 671B-parameter MoE model activating 37B per token that delivers frontier-class reasoning and coding at the lowest API price in the industry - $0.14/$0.28 input, $0.42 output per million tokens.

Google's cheapest Gemini model pairs a 1M-token context window with $0.10/$0.40 per million token pricing, multimodal input, and 359 tokens/second throughput for high-volume production workloads.

Zhipu's GLM-4.7-Flash is a 30B-A3B MoE model that posts 59.2% on SWE-bench Verified and 79.5% on tau2-Bench while running on a single RTX 4090 - MIT licensed and free via the Z.AI API.

Google Gemma 3 27B is a 27B dense multimodal model supporting text and vision with a 128K context window, 140+ languages, and single-GPU deployment - the most capable open model at its size class.

OpenAI's budget API workhorse pairs 128K context with $0.15/$0.60 per million token pricing, solid coding benchmarks, and the broadest third-party ecosystem of any small model.

Meta's Llama 4 Maverick packs 400B total parameters into a 128-expert MoE architecture with only 17B active per token, beating GPT-4o on Chatbot Arena while matching DeepSeek V3 on reasoning at half the active parameters.