Articles Tagged "LLM"

GPT-5.1

GPT-5.1

GPT-5.1 is OpenAI's November 2025 coding and agentic flagship with 400K context, configurable reasoning effort, and 76.3% on SWE-bench Verified.

Qwen3.7-Plus

Qwen3.7-Plus

Alibaba's first multimodal agent model, combining GUI grounding (ScreenSpot Pro 79.0), 1M-token context, and text-plus-vision input at $0.40/M tokens.

Best AI Models for RAG - June 2026

Best AI Models for RAG - June 2026

Gemini 2.5 Flash still leads LIT-RAGBench English RAG accuracy at 87.0%, but the full benchmark data reveals two overlooked entries: GPT-4.1-mini at 84.1% and o4-mini at 83.9%.

MAI-Code-1-Flash

MAI-Code-1-Flash

Microsoft's first in-house coding model, a 137B sparse MoE built natively for GitHub Copilot, beating Claude Haiku 4.5 on SWE-Bench Pro by 16 points.

Best AI Models for Text Summarization - June 2026

Best AI Models for Text Summarization - June 2026

Gemini 2.5 Flash Lite still leads the Vectara hallucination leaderboard at 3.3%, while two new entries - Gemini 3.5 Flash and Mistral Large 3 at $0.50/M - shift the value picture considerably since March.

Llama 3.3 70B Instruct

Llama 3.3 70B Instruct

Meta's Llama 3.3 70B Instruct matches Llama 3.1 405B on instruction following and math while running at 4-5x lower cost, with the lowest hallucination rate of any open-weight model on the Vectara summarization leaderboard.