Articles Tagged "Distillation"

Distillation Leaks, Weak Agents, and Research Sabotage

New papers show distillation silently transfers unsafe behaviors, weak agents bottleneck multi-agent pipelines, and frontier AI can't reliably audit sabotaged ML research.

US AI Labs Share Intel to Stop Chinese Model Theft

OpenAI, Anthropic, Google, and Microsoft are now sharing attack detection data through the Frontier Model Forum to collectively block Chinese adversarial distillation campaigns.

Apple Can Distill Google Gemini for On-Device Siri

New details reveal Apple has full data center access to Gemini and can create smaller on-device derivative models - far more control than the original deal disclosed.

FLUX.2 [klein] 9B

Black Forest Labs' 9B parameter distilled image model - sub-second generation with higher quality than the 4B variant, 19.6 GB VRAM, non-commercial license.

Claude Opus Reasoning Distilled Into Open 27B Model

A community fine-tune distills Claude Opus 4.6 chain-of-thought reasoning into Qwen3.5-27B via LoRA, racking up 4,000+ downloads in days. No benchmarks yet - but the approach raises familiar questions.

Qwen3.5-27B Claude Opus Distilled

Community fine-tune that distills Claude Opus 4.6 reasoning into Qwen3.5-27B via LoRA. 28B parameters, Apache 2.0, no published benchmarks.

Qwen3.5-27B Distilled vs Base: What You Gain

Comparing the Claude Opus reasoning-distilled Qwen3.5-27B against the base model - what chain-of-thought distillation adds and what it costs in context, multimodal, and reliability.

Ask Claude Sonnet 4.6 What Model It Is in Chinese - It Says DeepSeek

Claude Sonnet 4.6 identifies itself as DeepSeek when prompted in Chinese, just one day after Anthropic accused DeepSeek of industrial-scale distillation attacks. The cause is training data contamination, not an identity crisis - but the timing is spectacular.

Anthropic Says DeepSeek, Moonshot, and MiniMax Ran 24,000 Fake Accounts to Steal Claude's Capabilities

Anthropic accuses three Chinese AI labs of industrial-scale distillation attacks using 24,000 fraudulent accounts and 16 million exchanges with Claude. MiniMax ran the largest operation at 13 million exchanges. None of the three companies have responded.

A Group of College Students Distilled Claude, GPT, and Gemini Into Open-Source Models for $52

TeichAI, a four-person non-profit, generated 250 reasoning samples from Claude Opus 4.5, fine-tuned open-weight models on the result, and racked up 67,000 downloads. The legal and technical implications are more interesting than the benchmarks.

A Developer's Guide to Finetuning and Distilling Language Models

A practical, hands-on guide for software developers who want to finetune open-source LLMs and distill larger models into smaller, faster ones - covering techniques, tools, datasets, and cloud GPU options.