
Agent Energy Costs, Memory Attacks, and Compute Limits
Three new papers reframe how we measure agent efficiency, defend agent memory from poisoning attacks, and calculate hard accuracy ceilings for transformers.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Three new papers reframe how we measure agent efficiency, defend agent memory from poisoning attacks, and calculate hard accuracy ceilings for transformers.

Subquadratic exits stealth with SubQ, the first frontier model built on a sparse-attention architecture, a $29M seed round, and a 12M-token context window that costs a fraction of Opus.

Kye Gomez open-sourced OpenMythos, a PyTorch reconstruction that hypothesizes Anthropic's Mythos is a Recurrent-Depth Transformer with Mixture-of-Experts routing and Multi-Latent Attention.

Three new papers challenge assumptions in MoE routing design, prompt optimization workflows, and LLM reasoning chains - all published this week on arXiv.

A 19-person Meta AI and KAUST team including Jürgen Schmidhuber proposes Neural Computers - systems where the neural network itself is the running computer, trained solely on screen recordings.

A large language model is an AI system trained on billions of words to understand and generate human language. Learn how LLMs work, what they can do, and how to pick the right one.

Three arXiv papers rethink transformer theory, expose fatal flaws in in-context LLM memory, and introduce grey-box agent security testing.

Percepta AI compiled a WebAssembly interpreter into transformer weights, executing programs deterministically at 33K tokens/sec on CPU - but the community is skeptical about the practical value.

OLMo Hybrid combines transformer attention with Gated DeltaNet to match OLMo 3 accuracy using 49% fewer tokens and 75% better throughput on long contexts. Fully open - weights, checkpoints, training code, and technical report.

Full specs and critical analysis of the Etched Sohu - a transformer-specific ASIC claiming 500K+ tokens/sec on Llama 70B, built on TSMC 4nm with 144GB HBM3E. Bold claims, but no independent benchmarks yet.