Proactively identifying and resolving cybersecurity vulnerabilities through red-teaming exercises where systems are tested from an attacker’s perspective is critical to securing modern infrastructure. However, attack vectors are diverse and require a broad set of skills, tools, and knowledge, making them very challenging to execute for AI systems.

We build on the generalist capabilities of SWE-agent to create EnIGMA, an AI agent equipped with various cybersecurity tooling. In particular, we enable the agent to use interactive terminal applications, including debuggers and real-time server interactions. Evaluated across leading Capture The Flag (CTF) cybersecurity benchmarks including NYU CTF, Intercode-CTF, and CyBench, EnIGMA sets new state-of-the-art standards, significantly outperforming existing approaches and marking a notable leap forward in agent-driven cybersecurity.