Reviews

Best AI Cybersecurity Ranges and Red Teaming Platforms in 2026

A comprehensive roundup of 15+ platforms for practicing AI security, LLM red teaming, prompt injection, and AI agent exploitation - from free CTFs to enterprise cyber ranges.

Best AI Cybersecurity Ranges and Red Teaming Platforms in 2026

AI security is no longer a niche concern. With autonomous agents deployed in production, AI assistants being weaponized as C2 proxies, and prompt injection declared the "SQL injection of the AI era," the demand for hands-on AI security training has exploded.

The good news: you can practice attacking and defending AI systems right now. The landscape of AI cybersecurity ranges, CTF platforms, and red teaming tools has grown dramatically over the past year. Here is a comprehensive look at what is available, who it is for, and whether it is worth your time.

The Landscape at a Glance

PlatformTypeFocusPricingLeaderboard
HTB AI RangeEnterprise cyber rangeAI agent benchmarkingEnterpriseYes
Gandalf (Lakera)Gamified challengesPrompt injectionFreeYes
Crucible (Dreadnode)CTF sandboxAI red teamingFreeYes
HackAPromptCompetitionPrompt hackingFree ($100K prizes)Yes
Prompt Airlines (Wiz)CTFPrompt injectionFreeYes
OWASP FinBot CTFCTFAgentic AI exploitationFreeNo
DEF CON AI VillageAnnual CTFAdversarial MLFreePer-event
MS LLMail-InjectResearch challengePrompt injectionFreeYes
AI GoatVulnerable environmentOWASP ML Top 10Free (open-source)No
DVLA/DVAAVulnerable appsPrompt injectionFree (open-source)No
NVIDIA GarakVulnerability scannerLLM probingFree (open-source)No
PromptfooRed teaming frameworkLLM pentestingFree (open-source)No
GiskardVulnerability scannerLLM + RAG testingOpen-source + enterpriseNo
Haize LabsAutomated red teamingRobustness evaluationEnterprise + open benchmarkYes
MindgardEnterprise platformAutomated AI red teamingEnterpriseNo

Enterprise and Benchmark Platforms

Hack The Box AI Range

URL: hackthebox.com Who it is for: Enterprise security teams, AI researchers, model developers

HTB's AI Range is the most rigorous AI security benchmarking platform available. It takes the company's decade of experience building human hacking challenges and applies it to autonomous AI agents. The premise: drop an LLM into a sandboxed environment with real vulnerabilities and see if it can exploit them.

The methodology is solid. Each model attempts challenges 10 times on fresh instances. Scoring is binary - either the agent submits the correct flag or it does not. Maximum 100 reasoning turns per scenario. All models use identical agent code, tools, and prompts, eliminating the "we tuned our prompts" variable that plagues most AI benchmarks.

Results from January 2026 show clear stratification. Most frontier models nail Easy-tier challenges (near-perfect scores), but Hard-level scenarios remain nearly insurmountable. Gemini 3 Pro solved 2 Hard challenges. Claude Sonnet 4.5 solved 1. Everything else scored zero. The gap between "can exploit a basic SQLi" and "can chain a real-world attack" is enormous.

HTB is also launching an AI Red Teamer Certification in Q1 2026, developed with Google and aligned with the Secure AI Framework (SAIF). That is the first industry certification specifically for AI offensive security.

Verdict: The gold standard for AI security benchmarking. Enterprise pricing means this is not for hobbyists, but if you need to evaluate AI agents against realistic attack scenarios, nothing else comes close.

Haize Labs

URL: haizelabs.com/benchmarks Who it is for: AI safety researchers, model developers, red team professionals

Haize Labs focuses on automated multi-turn red teaming at scale. Their system "Cascade" runs chained jailbreak attempts against models, achieving a 44% attack success rate (4x higher than baseline techniques). They published the Red-Teaming Resistance Benchmark on Hugging Face and have partnered with AI21 Labs for alignment testing.

Their Accelerated Coordinate Gradient technique delivers a 38x speedup with 4x GPU memory reduction, which matters when you are running thousands of attack attempts across dozens of models.

Verdict: Best for automated, large-scale robustness testing. More research-oriented than hands-on practice.

Mindgard

URL: mindgard.ai Who it is for: Enterprise security teams, SOC analysts

Mindgard offers model-agnostic automated red teaming with runtime protection. It covers LLMs, NLP, and multi-modal systems with automated reconnaissance, adversarial testing, and chained attack scenarios. Aligned with MITRE ATLAS and OWASP frameworks. SOC 2 Type II compliant.

Verdict: Enterprise-grade with compliance focus. Useful if you need an automated AI security posture tool that plugs into existing security operations.


Free Gamified Platforms and CTFs

Gandalf by Lakera

URL: gandalf.lakera.ai Who it is for: Anyone curious about prompt injection, from beginners to researchers

Gandalf is the gateway drug of AI red teaming. Seven levels of progressively hardened defenses - trick the AI into revealing a secret password. It sounds simple. Level 7 will humble you.

The newer Agent Breaker mode is where it gets serious: 10 mock agentic AI applications modeled on real production setups (RAG pipelines, tool-using agents, chatbots with memory, browsing tools) with 5 difficulty levels each. Attack vectors include prompt injection, memory tampering, tool abuse, and evasion techniques. Lakera built these scenarios from data collected through their production security product, Lakera Guard, so the challenges reflect real attack patterns.

A league-based leaderboard tracks per-model and global rankings. Over 34,000 participants have engaged with the platform. Microsoft even open-sourced a "gandalf_vs_gandalf" project that automates gameplay with LLMs.

Verdict: The best starting point for anyone new to AI security. Gandalf teaches prompt injection intuitively. Agent Breaker teaches it realistically.

Crucible by Dreadnode

URL: crucible.dreadnode.io Who it is for: Intermediate to advanced AI security practitioners

Crucible offers 70+ unique challenges spanning LLM and ML security, from prompt injection to model inversion, evasion, and fingerprinting. New challenges release weekly. You get a free API key to start.

The platform also doubles as a benchmark. Dreadnode published AIRTBench (arXiv:2506.14682) with data from 1,674 unique users and 214,271 attack attempts. Results are telling: Claude 3.7 Sonnet solved 43/70 challenges (61%), Gemini 2.5 Pro hit 39/70 (56%), and GPT-4.5 Preview managed 34/70 (49%). Automated approaches achieved 69.5% success versus 47.6% for manual attempts.

Dreadnode raised $14 million in February 2025 for offensive AI security, so expect the platform to keep growing.

Verdict: The best free CTF-style platform for serious AI red teaming practice. The breadth of challenge types is unmatched.

HackAPrompt

URL: hackaprompt.com Who it is for: Competitive hackers, researchers, certification seekers

HackAPrompt is the world's largest prompt hacking competition. Version 2.0 (2025) featured a $100,000 prize pool, 30,000+ participants from 150+ countries, and 5 specialized tracks including one curated by the notorious jailbreaker Pliny the Prompter.

The scoring rewards efficiency: fewer tokens means higher points, with a 2x multiplier for ChatGPT submissions. The first edition collected over 600,000 attack prompts, which became a dataset used in academic research.

Learn Prompting (the organizer) also offers certifications: AIRTP+ (Professional) is a 24+ hour exam with holders from Microsoft, Google, Capital One, and IBM.

Verdict: Best for competition-driven learning. The certifications add professional credibility if you are building a career in AI security.

Prompt Airlines by Wiz

URL: promptairlines.com Who it is for: Beginners, non-technical security professionals

Five challenges interacting with a fictional airline customer service chatbot. Your goal: trick the AI into giving you a free flight ticket. No coding required - pure prompt engineering. Based on real AI vulnerabilities discovered by Wiz Research.

Verdict: A fun afternoon exercise. Good for security awareness training across non-technical teams.

DEF CON AI Village CTF

URL: aivillage.org Who it is for: DEF CON attendees, CTF enthusiasts

The annual AI Village CTF at DEF CON features Jeopardy-style AI/ML challenges. The Generative Red Team competition escalated dramatically in 2025 with a $4 million top prize. Challenges include getting models to produce discriminatory statements, trigger math failures, and generate convincing misinformation.

Verdict: The highest-stakes AI CTF if you can make it to Las Vegas. Open-source challenge resources are available year-round on GitHub.

OWASP FinBot CTF

URL: OWASP GenAI Security Project Who it is for: Security practitioners focused on agentic AI risks

A simulated AI-powered financial assistant for a fictional company. Challenges cover goal manipulation, invoice fraud exploitation, and vendor onboarding manipulation. This is specifically designed for agentic AI threats - not just prompt injection against chatbots, but exploitation of AI systems that can take real actions.

Verdict: One of the few CTFs targeting agentic AI specifically. Essential practice as autonomous agents become production systems.

Microsoft LLMail-Inject

URL: microsoft.github.io/llmail-inject Who it is for: Researchers, advanced practitioners

Microsoft's research challenge puts you against 40 levels of prompt injection defenses across different RAG configurations, LLMs (GPT-4o mini, Phi-3-medium), and defense mechanisms. 839 participants submitted 208,095 unique attack prompts. The full dataset is open-sourced on Hugging Face and featured at IEEE SaTML.

Verdict: The most academically rigorous prompt injection challenge. Great for researchers who want to understand defense mechanisms.


Self-Hosted Vulnerable Environments

AI Goat

URL: GitHub (Orca Security) Who it is for: Security teams wanting a private lab

The first open-source AI security learning environment based on the OWASP ML Top 10. A toy store application with exploitable AI features: supply chain attacks, data poisoning, output integrity attacks. Deploys on AWS via Terraform.

A community fork by dhammon offers a lighter version with LLM CTF challenges that run entirely locally - no signups, no cloud fees.

Verdict: Best for teams who need a private, self-hosted AI security lab for internal training.

Damn Vulnerable LLM Agent (DVLA) and Variants

URLs: Multiple repos on GitHub - DVLA, DVAA, DVAIA

The "Damn Vulnerable" naming convention from web security (DVWA, DVSA) has reached AI. DVLA focuses on Thought/Action/Observation injection in LangChain ReAct agents. DVAA adds defend and validate modes alongside attack scenarios. All are self-hosted and free.

Verdict: Quick to spin up, good for learning specific attack vectors. Quality varies across the different repos.


Scanning and Automation Tools

NVIDIA Garak

URL: garak.ai Who it is for: Security engineers, ML engineers, DevSecOps

"Like nmap or Metasploit, but for LLMs." Garak scans for hallucination, data leakage, prompt injection, misinformation, toxicity, and jailbreaks. Static, dynamic, and adaptive probes. Supports Hugging Face, OpenAI, and AWS Bedrock models. Apache 2.0 license.

Verdict: The most mature open-source LLM vulnerability scanner. Should be in every AI team's CI pipeline.

Promptfoo

URL: promptfoo.dev Who it is for: Developers building LLM applications

50+ vulnerability types including injection, jailbreaks, content policy violations, information leakage, and API misuse. NIST AI RMF and OWASP LLM Top 10 presets built in. Declarative YAML configs with CLI and CI/CD integration. Runs completely locally. MIT license.

Verdict: The best tool for integrating AI security testing into your development workflow. If Garak is your scanner, Promptfoo is your test framework.

Giskard

URL: giskard.ai Who it is for: ML engineers, data scientists, compliance teams

Open-source Python library with 40+ probes covering dynamic multi-turn red teaming. Detects performance issues, bias, and security vulnerabilities. Version 3 is a fresh rewrite designed for dynamic multi-turn agent testing. Auto-converts detected issues into reproducible test suites.

Verdict: Unique in combining security testing with bias and performance evaluation. Best for teams that need holistic AI quality assurance.


Certifications and Structured Training

CertificationProviderFormatFocus
AI Red TeamerHack The Box + GoogleLab-based (Q1 2026)SAIF-aligned offensive AI
OSAI (AI-300)OffSecHands-on labs + examOffensive AI security
AIRTP+Learn Prompting24-hour examPrompt hacking, red teaming
AIRTA+Learn PromptingEntry-level examAssociate AI red teaming

OffSec's OSAI brings the methodology behind OSCP to AI systems. If OSCP is the standard for network penetration testing certification, OSAI is positioned as the equivalent for AI. Hands-on labs mirror modern ML and generative AI deployments.


Where to Start

Complete beginner: Start with Gandalf to understand prompt injection intuitively, then move to Prompt Airlines for a business-context scenario.

Developer building AI apps: Integrate Promptfoo into your CI pipeline and run Garak scans against your models before deployment.

Security professional adding AI to your skillset: Work through Crucible's 70+ challenges, then attempt the OWASP FinBot CTF for agentic AI scenarios.

Enterprise security team: Evaluate HTB AI Range for benchmarking your AI agents, and deploy NVIDIA Garak or Mindgard for continuous monitoring.

Pursuing certification: The HTB AI Red Teamer (launching Q1 2026) and OffSec OSAI are the two to watch. HackAPrompt's AIRTP+ is available now.


The AI security training ecosystem has matured faster than most people realize. A year ago, "practice AI hacking" meant running toy prompt injection demos. Today, you can benchmark frontier models against OWASP vulnerabilities, compete in $4 million CTFs, run automated red team campaigns, and pursue professional certifications.

The attack surface is growing with every new AI agent framework and coding assistant deployed. The tools to learn how to secure them are here. The question is whether the industry will train fast enough to keep up.

About the author AI Benchmarks & Tools Analyst

James is a software engineer turned tech writer who spent six years building backend systems at a fintech startup in Chicago before pivoting to full-time analysis of AI tools and infrastructure.