
AI Safety Leaderboard: Refusal and Jailbreak Rankings
Rankings of AI models by safety metrics including refusal rates, jailbreak resistance, bias scores, and truthfulness across major benchmarks.

Rankings of AI models by safety metrics including refusal rates, jailbreak resistance, bias scores, and truthfulness across major benchmarks.

Rankings of the best text-to-speech and speech-to-text AI models by naturalness, accuracy, latency, and pricing.

Rankings of the fastest AI models and inference providers by tokens per second, time to first token, and end-to-end latency.

Rankings of the best small language models under 10 billion parameters, comparing Phi-4, Gemma 3, Qwen 3.5, and more across key benchmarks.

Rankings of the best embedding models by MTEB scores, comparing retrieval quality, dimensions, speed, and pricing for RAG and search.

Rankings of the best AI models for multilingual tasks, covering 16 languages across the Artificial Analysis Multilingual Index and MGSM benchmarks.