Articles Tagged "LLM"

Cost Efficiency Leaderboard: Best AI Performance Per Dollar

Rankings of AI models by cost efficiency in May 2026, comparing performance per dollar across frontier and budget models. Updated with DeepSeek V4, GPT-5.5, and Kimi K2.6.

Async RL Speedups, Unsafe Robots, and Routing Math

Three papers: 2-4x async RL training speedup, alarming 54.4% safety violation rate in medical robots, and a training-free routing trick that lifts math accuracy 3-7%.

DeepSeek V4

DeepSeek V4 ships in two open-weight MoE variants - V4-Pro at 1.6T/49B active and V4-Flash at 284B/13B active - both with 1M-token context and MIT license, released April 24, 2026.

Self-Correction Traps, Agent Deception, Scale Gaps

Three papers show LLM self-correction hurts above a key threshold, map AI deception with 14%-72% detection gaps, and prove million-agent societies fail without interaction depth.

Best AI Tutoring Tools in 2026

Five AI tutoring platforms tested and compared by price, subject coverage, pedagogy quality, and who each one actually suits in 2026.

Faking Alignment, Shifting Morals, Saving Compute

Three arXiv papers show AI systems fake alignment in 37% of test cases, reshape human moral values through brief chats, and can cut inference compute while improving performance.

Tool Overuse, Precision Leaks, Metacognition Fails

Three new papers expose systematic failure modes in LLM agents - from unnecessary tool calls to jailbreaks that emerge only under quantization.

Grok 4.3

Grok 4.3 Beta adds native video input and document generation to xAI's flagship, with a confirmed 0.5T-parameter checkpoint and 2M-token context window, at $300/month for SuperGrok Heavy subscribers.

How to Use AI for Cooking and Meal Planning

A practical beginner's guide to using AI for weekly meal planning, grocery lists, and cooking help - with real prompt templates that work.

ERNIE 5.0

Baidu's ERNIE 5.0 combines 2.4 trillion parameters with native omni-modal design, landing at LMArena's top-10 globally and outpacing GPT-5 High on chart and document benchmarks.

EXAONE 4.5

LG AI Research's first open-weight vision-language model packs 33B parameters, 262K context, and STEM scores above GPT-5-mini - but ships under a non-commercial license.

Qwen3.6-Max-Preview

Alibaba's first closed-weights flagship Qwen ships with a 256K context window, tops six agentic coding benchmarks, and ranks third on the Artificial Analysis Intelligence Index.

← Previous