Articles Tagged "LLM"

Hailo-10H - Edge AI With On-Device LLMs

Complete specs, benchmarks, and analysis of the Hailo-10H - a 2.5W edge AI accelerator with 40 TOPS INT4, on-module LPDDR4, and the ability to run LLMs and VLMs on a Raspberry Pi at 10 tokens per second.

GPT-5.2

GPT-5.2 is OpenAI's most capable model with three modes, 400K context, and record-setting professional benchmarks - but speed and pricing raise questions.

Ollama Cloud Review: From Local LLMs to Seamless Cloud Inference

Ollama Cloud extends the popular local LLM runner to the cloud, letting you push models from your laptop and serve them globally. We test latency, cold starts, pricing, and the developer experience against dedicated inference providers.

OpenRouter Review: One API Key to Rule Them All

OpenRouter routes your API calls to 300+ models across every major provider through a single endpoint. We benchmark its routing, latency overhead, pricing, and reliability against direct API access.

Inception Ships Mercury 2 - A Diffusion LLM That Hits 1,009 Tokens Per Second

Inception Labs launches Mercury 2, the first diffusion-based reasoning language model, generating over 1,000 tokens per second on Blackwell GPUs at a fraction of the cost of conventional autoregressive models.

Google VP Says Two Types of AI Startups Are Running Out of Time

Google's head of global startups warns that LLM wrapper companies and AI aggregators face extinction as margins collapse and Big Tech absorbs their features.

Guide Labs Open-Sources Steerling-8B, an LLM That Shows Its Work

YC-backed startup Guide Labs releases Steerling-8B under Apache 2.0 - an 8.4B parameter model with a built-in concept module that traces every output token back to its training data.

Google's Gemini 3.1 Pro Doubles Reasoning Performance and Retakes the AI Crown

Google releases Gemini 3.1 Pro with 77.1% on ARC-AGI-2, more than doubling the reasoning capability of its predecessor and beating Claude Opus 4.6 and GPT-5.2 on most benchmarks.

Superpower Launches Its AI Doctor: 140,000 Lines of Code to Replace Your 15-Minute Checkup

A $34M-funded health startup just shipped an AI doctor that remembers every symptom, tracks 100+ biomarkers, and calls you out when you lie about your diet. The bet is that a machine with perfect memory can outperform a physician with 15 minutes.

A Developer's Guide to Finetuning and Distilling Language Models

A practical, hands-on guide for software developers who want to finetune open-source LLMs and distill larger models into smaller, faster ones - covering techniques, tools, datasets, and cloud GPU options.

Claude Sonnet 4.6 Review: The Workhorse That Ate the Flagship

Anthropic's mid-tier model delivers 98% of Opus performance at one-fifth the cost, with a 1M token context window and near-parity on coding and computer use benchmarks.

Google Launches Gemini 3.1 Pro, Claims Top Spot on 13 of 16 Benchmarks

Google releases Gemini 3.1 Pro with dramatically improved reasoning, topping Claude Opus 4.6 and GPT-5.2 on most industry benchmarks.

← Previous