All You Need Is A Fuzzing Brain: Our Key Achievements in DARPA’s AI Cyber Challenge

Published: September 2025

Introduction

We’re thrilled to share the technical details behind our team’s success in DARPA’s Artificial Intelligence Cyber Challenge (AIxCC). Our team, All You Need Is A Fuzzing Brain, achieved 4th place among seven finalists, autonomously discovering 28 security vulnerabilities including 6 previously unknown zero-days in real-world open-source projects.

This blog post provides an in-depth look at our LLM-powered Cyber Reasoning System (CRS) that made this achievement possible. Our complete system is open source on GitHub.

The Challenge: Autonomous Vulnerability Discovery

DARPA’s AIxCC challenged teams to build systems that could:

Generate Proofs-of-Vulnerability (POVs): Create inputs that trigger sanitizer errors in fuzzing harnesses
Produce Patches: Generate diff files that fix vulnerabilities while preserving functionality
Operate Autonomously: Work across three challenge modes (Delta-Scan, Full-Scan, Report-Based)

The competition featured real-world C/C++ and Java projects, with scoring based on accuracy and speed.

Our Approach: The FuzzingBrain Architecture

System Overview

FuzzingBrain consists of four core services running in parallel across ~100 VMs:

CRS Web Service: Central coordinator handling task decomposition and fuzzer distribution
Static Analysis Service: Provides function metadata, reachability analysis, and call paths
Worker Services: Execute 23 parallel LLM-based strategies for POV/patch generation
Submission Service: Manages deduplication, SARIF validation, and bundle creation

Massive Parallelization

Our key insight was that parallelization is everything. We deployed:

~100 virtual machines with 32-192 cores each
100-10,000 concurrent threads per VM
23 distinct LLM-based strategies running simultaneously
Multiple fuzzing harnesses processed concurrently

This architecture enabled us to discover most vulnerabilities within the first 30 minutes of analysis.

Technical Deep Dive: LLM-Powered Strategies

POV Generation: 10 Specialized Strategies

Our POV generation employs iterative, dialogue-based interaction with LLMs:

Base Strategy (xs0_delta)

System Prompt → User Message (diff + harness) → LLM Response → 
Python Execution → Fuzzer Test → Feedback Loop

Advanced Enhancements

Multi-Input Generation: Generate 5 test cases per iteration instead of 1

Coverage-Guided Feedback: For failed attempts, we provide:

Executed functions with ±3 lines of context
Branch decisions and control flow information
LCOV-style reports for C/C++, JaCoCo for Java

Vulnerability Category Prompting: Targeted prompts for specific CWE classes:

C/C++: Buffer overflows, use-after-free, integer overflows (10 categories)
Java: Deserialization, injection, XXE processing (15 categories)

Patching: 13 Sophisticated Strategies

Our patching follows a rigorous 7-step workflow:

Target Function Identification: LLM-based analysis to find vulnerable functions
Metadata Extraction: Complete source code and context retrieval
Patch Generation: LLM produces revised function implementations
Function Rewrite: Replace original with LLM-generated content
Diff Creation: Generate standard .diff files
Validation: Compilation + POV testing + functionality tests
Iterative Refinement: Structured feedback for failed attempts

Novel XPatch Strategy

Our XPatch approach generates patches even without POVs:

Delta-scan: Analyze all modified functions from commit diffs
Full-scan: LLM scoring of fuzzer-reachable functions (1-10 scale)
Validation: 60-second LibFuzzer runs to detect new crashes

Multi-Model Resilience

We implemented a robust fallback mechanism across frontier LLMs:

claude-3.7, chatgpt-latest, claude-opus-4, o3, gemini-2.5-pro
Up to 5 iterations per model before automatic fallback
Leverages complementary strengths of different models

When one model fails, the system seamlessly transitions to the next, maximizing success rates.

Static Analysis: Custom Toolchains

C/C++ Analysis Pipeline

LLVM + SVF + Bear: Generate bitcode → construct call graphs → compute reachability
Engineering Challenges: Missing headers, large bitcode files (>50MB)
Solutions: Parallel analysis, trimming, 10-minute timeouts
Performance: >95% success rate on curl, dropbear, sqlite3

Java Analysis with CodeQL

Custom Queries: Reachable functions and call path computation
Batch Processing: 1,000 fuzzer-target pairs per batch
Isolated Execution: Cloned databases prevent race conditions
Speed: <5 minutes for Apache Zookeeper, Tika, Commons-Compress

Competition Results: By the Numbers

Final Round Performance

28 total vulnerabilities discovered
6 zero-day vulnerabilities (previously unknown)
14 successful patches with functionality preservation
4th place out of 7 finalist teams
60 total challenges in the final round

Key Performance Insights

Speed: Most vulnerabilities found within 30 minutes Effectiveness: LLM strategies discovered nearly all POVs vs. traditional fuzzing Scalability: Successfully handled 50+ fuzzers per project Accuracy: Maintained high precision through rigorous validation

Lessons Learned & Optimizations

Resource Management

Sanitizer Selection: Prioritized AddressSanitizer over Memory/UndefinedBehavior
Time Budgeting: 60-minute caps on LLM fuzzing, 45 minutes if POVs exist
Patch Submission Limits: Max 5 POV-based + 3 XPatches per vulnerability

API Cost Control

We exhausted OpenAI credits in one round due to workers assigned to unreachable code. Our solution:

Static analysis preprocessing to identify reachable targets
Early termination for impossible scenarios
Resource allocation based on reachability analysis

Open Source Contributions

FuzzingBrain CRS

Complete system implementation with all 23 strategies: github.com/o2lab/afc-crs-all-you-need-is-a-fuzzing-brain

LLM Vulnerability Detection Leaderboard

Benchmarking framework for comparing LLMs on security tasks: fuzzingbrain.github.io/FuzzingBrain-Leaderboard

Research Team

Our interdisciplinary team spanned three institutions:

Texas A&M University: Jeff Huang (PI), Ze Sheng, Qingxiao Xu, Jianwei Huang, Matthew Woodcock, Guofei Gu

City University of Hong Kong: Heqing Huang

Imperial College London: Alastair F. Donaldson

Looking Forward

The success of FuzzingBrain demonstrates the immense potential of LLM-powered security analysis. Key areas for future research:

Model Specialization: Training domain-specific models for vulnerability detection
Hybrid Approaches: Better integration of symbolic execution and dynamic analysis
Scalability: Extending to larger codebases and more programming languages
Real-time Defense: Applying techniques to production security monitoring

Conclusion

Our 4th place finish in DARPA’s AIxCC represents a significant milestone in autonomous vulnerability discovery. By combining massive parallelization, sophisticated LLM strategies, and robust engineering, we created a system capable of finding critical security flaws at unprecedented scale and speed.

The cybersecurity landscape is evolving rapidly, and we believe LLM-powered analysis will play an increasingly crucial role in defending against emerging threats. We’re excited to continue pushing the boundaries of what’s possible in automated security research.

Want to learn more? Check out our complete technical paper and explore our open source implementation.

Questions or collaboration opportunities? Reach out to our team at [contact information].