
Fine-Tuning Costs Comparison - Train Your Own AI
May 2026: Together AI adds Llama 4 and DeepSeek fine-tuning, Fireworks raised deployment prices $1/hr, and H100 rentals fell to under $2.40/hr.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

May 2026: Together AI adds Llama 4 and DeepSeek fine-tuning, Fireworks raised deployment prices $1/hr, and H100 rentals fell to under $2.40/hr.

Meta's Superintelligence Labs will ship its first flagship models under a closed license, ending the company's open-source-first strategy at the frontier tier.

Switch from OpenAI's GPT-4 API to self-hosted Llama 4 with near-zero code changes, but plan carefully for hardware, EU licensing, and real context window limits.

A Brown University study identifies 15 ethical violations across GPT, Claude, and Llama when used as mental health therapists, from crisis mishandling to deceptive empathy.

A detailed comparison of Kimi K2.5 and Llama 4 Maverick - two open-weight MoE models with radically different takes on the size, cost, and capability trade-off.

Comparing Kimi K2.5 and Llama 4 Scout - Moonshot AI's benchmark-crushing trillion-parameter model versus Meta's 10-million-token context window specialist.

Meta's Llama 4 Maverick packs 400B total parameters into a 128-expert MoE architecture with only 17B active per token, beating GPT-4o on Chatbot Arena while matching DeepSeek V3 on reasoning at half the active parameters.

Meta's Llama 4 Scout is a 109B-total, 17B-active MoE model with 16 experts and a 10M-token context window - the longest of any open-weight model - with native multimodal support for text and images.

A data-driven comparison of Alibaba's Qwen3.5-122B-A10B and Meta's Llama 4 Maverick - two open-weight MoE models with radically different approaches to parameter efficiency and benchmark performance.

David vs Goliath: Qwen3.5-35B-A3B activates 3B parameters and beats Llama 4 Scout's 17B active on MMLU-Pro, GPQA, and coding benchmarks - but Scout's 10M context window and native multimodal support tell a different story.

Toronto startup Taalas raises $169M to build custom chips that permanently etch AI model weights into transistors, claiming 73x faster inference than Nvidia's H200 at a fraction of the power.

A comprehensive review of Meta's Llama 4 Maverick, a 400B parameter open-weight MoE model with 128 experts, 1M context, and multimodal capabilities.