
Qwen-RobotManip
Alibaba's generalist VLA model for robotic manipulation, built on Qwen3.5-4B with a DiT action decoder, trained on 38,100+ hours of open-source data, and ranked first on the RoboChallenge generalist track.
They summarize our coverage. We write it.
Newsletters like this one rebroadcast our headlines - often without the full review, the source reading, or the analysis underneath. Our weekly briefing sends the work they paraphrase, straight from the desk, before they get to it.
Free, weekly, no spam. One email every Tuesday. Unsubscribe anytime.

Alibaba's generalist VLA model for robotic manipulation, built on Qwen3.5-4B with a DiT action decoder, trained on 38,100+ hours of open-source data, and ranked first on the RoboChallenge generalist track.

Three papers from today's arXiv: workplace agents jumped from 43% to 89% task completion in two years, a 47-researcher coalition ships a unified eval schema, and agent memory only helps when similarity tops 0.8.

Salesforce agrees to pay $3.6 billion for Fin, the AI customer service agent formerly known as Intercom, adding a proprietary model and 30,000 customers to Agentforce.

Three new papers expose a 50-point gap in agent tool knowledge, show tree search tripling inference throughput, and map the research between AGI and superintelligence.

A new impossibility theorem proves feedback-based training can't guarantee honest AI, while two papers cut agent memory costs 78% and multi-agent latency 7x.

Three new arXiv papers expose how context bloat tanks agent performance, agent memory bleeds private data, and misaligned behavior spreads through multi-agent systems.

New research reveals MCP error messages triple agent attack success rates, ranks eight models on sycophancy with Claude scoring best, and finds self-evolving agents make 30-42% false edits.

Anthropic's Claude Opus 4.8 scores 69.2% on SWE-bench Pro and ships hundreds of parallel subagents in Claude Code, with pricing unchanged at $5 per million input tokens.

Three papers: strategic attack timing exposes gaps in AI control evaluations, Perplexity's agents slash task time by 87%, and Lean4 formal proofs make agent workflows more reliable.

OpenAI's new Lockdown Mode cuts the network exits that prompt injection attacks use to steal data from ChatGPT - but won't stop malicious instructions from entering the model in the first place.

NVIDIA's 550B Nemotron 3 Ultra, released June 4, tops the US open-weight leaderboard with a hybrid Mamba-Transformer MoE architecture and 300-plus tokens per second throughput.

Three new arXiv papers expose how developers miss AI sabotage 94% of the time, why LLMs converge structurally in code evolution, and how ZK proofs could verify frontier AI training.