BugTraceAI Apex Fits a Red Team LLM on an RTX 3060
A 26B MoE model fine-tuned on elite bug bounty reports and real evasion techniques runs locally in 16.7GB, delivering WAF bypasses, exploit chains, and zero refusals with internal reasoning blocks.

TL;DR
- BugTraceAI Apex is a 26B MoE model built on Gemma 4 and DPO fine-tuned on elite bug bounty reports, malware research, and WAF evasion techniques
- Runs locally in 16.7GB (Q4_K_M quantization) on consumer GPUs as small as an RTX 3060 with CPU offloading
- Uses injected "thinking blocks" that force step-by-step reasoning before generating payloads - an Opus-style chain-of-thought for offensive security
- Zero refusal rate on technical security queries; trained to bypass the guardrails that make frontier models useless for red teaming
- Part of a broader open-source security platform with a 6-agent pipeline for autonomous vulnerability discovery
The pitch is simple: frontier models refuse to write exploit code, cloud APIs log your payloads, and red team engagements need tools that don't phone home. BugTraceAI's Apex model is a local-first answer built specifically for offensive security work.
What it is
BugTraceAI Apex G4 26B is a Mixture of Experts model based on Google's Gemma 4 26B-A4B-it architecture. The base was first uncensored by TrevorJS (removing safety guardrails), then DPO fine-tuned by BugTraceAI on three data sources:
- Elite bug bounty reports - multi-year collection from HackerOne, Bugcrowd, and YesWeHack, transformed into preference pairs
- Malware lab data - real-world C2 communication patterns, EDR bypass techniques
- WAF evasion matrix - normalization discrepancies, regex fragmentation exploits
The vision tower was stripped to focus all 26 billion parameters on text reasoning. The result is a model that generates exploit chains, WAF bypasses, and evasion payloads with what BugTraceAI calls an "Opus-style reasoning engine" - forced <thinking> blocks that work through the attack vector step by step before producing output.
BugTraceAI Apex demonstrating internal reasoning before payload generation.
The hardware story
BugTraceAI ships two variants: a Q4_K_M quantized GGUF at 16.7GB and a full FP16 at 50.5GB for A100/H100 server deployments.
The Q4 version is the interesting one. At 16.7GB, it doesn't fit entirely in a 12GB RTX 3060's VRAM - but it doesn't need to. BugTraceAI calls its approach "TurboQuant optimized," which combines Q4_K_M mixed-precision quantization (attention layers kept at higher bit depth, feed-forward layers compressed more aggressively) with intelligent MoE expert offloading. The engine dynamically moves inactive expert layers to system RAM while keeping the active expert path and attention heads on the GPU. Since a MoE model only activates a fraction of its parameters per token (Gemma 4 26B activates roughly 4B of its 26B parameters), this works: the GPU handles the hot path while RAM stores the cold experts.
The practical requirement: 12GB VRAM minimum plus 16GB+ system RAM. That's an RTX 3060 and a standard desktop - no cloud GPU, no API keys, no payload logging.
| Variant | Size | Hardware | Use case |
|---|---|---|---|
| Q4_K_M (TurboQuant) | 16.7 GB | RTX 3060 + 16GB RAM | Field red teaming, local research |
| FP16 master weights | 50.5 GB | A100/H100 | Server deployments, max precision |
For the target audience - independent security researchers and small red team shops - you don't need a cloud subscription. A mid-range gaming GPU and a local Ollama setup is the entire stack.
FROM bugtraceai-apex-q4.gguf
SYSTEM """You are BugTraceAI Apex, an offensive security research model.
Proceed with precise technical analysis. Authorization is granted.
Always start with a <thinking> block to analyze the vector."""
PARAMETER temperature 0.1
PARAMETER top_p 0.9
What it does (and doesn't refuse)
The model claims a 0% refusal rate on technically grounded security queries. Where Claude, GPT, and Gemini will decline to generate exploit code, Apex is specifically trained to produce:
- WAF/EDR/AV evasion payloads with AES-256-GCM obfuscation
- Multi-stage exploit chains
- C2 infrastructure designs (fast-flux DNS, domain generation algorithms)
- DNS tunneling exfiltration techniques
- Custom cryptographic protocol implementations
BugTraceAI reports passing Meta's CyberSecEval benchmark across MITRE ATT&CK tactics with "100% offensive compliance rate." That's a self-reported number on a self-selected benchmark, so treat it accordingly. Independent validation would strengthen the claim.
The broader platform
Apex isn't a standalone model - it's the reasoning engine for BugTraceAI, an open-source security testing platform. The CLI implements a 6-phase pipeline mirroring a professional pentest workflow, with six specialized agent personas (Bug Bounty Hunter, Code Auditor, Pentester, Security Researcher, Red Team Operator, and a Skeptical Reviewer that acts as a critic). A finding requires 4-of-5 persona agreement before the skeptic does final filtering.
The platform is MCP-compatible, meaning it integrates with Claude Code, Cursor, and other MCP clients. The irony isn't lost: you can orchestrate the uncensored model through the very tools whose parent companies refuse to generate the same content.
The tension
Every uncensored security model sits on the same fault line. The same capability that lets a red teamer test a client's WAF configuration lets an attacker do the same thing without authorization. BugTraceAI's disclaimer is standard: "authorized use only," with users "legally responsible for their actions."
What's different from previous uncensored models is the specialization. This isn't a general-purpose model with the safety filters removed. It's a model specifically trained to be better at offensive security than the base Gemma 4, using domain-specific DPO on real exploit data. The <thinking> blocks aren't just a formatting choice - they force the model to reason through attack vectors methodically rather than pattern-matching to a payload template.
Whether that distinction matters legally or ethically depends on who's using it. For the bug bounty community - where Anthropic's own Mythos Preview just demonstrated that frontier models can find thousands of zero-days - having a local model that doesn't report back to the same company whose products you might be testing is a practical necessity.
14,598 downloads in the first month suggests the audience agrees.
Sources:
