PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

Aug 14, 2024·
Gelei Deng
Gelei Deng
,
Yi Liu
,
Víctor Mayoral-Vilches
,
Peng Liu
,
Yuekang Li
,
Yuan Xu
,
Tianwei Zhang
,
Yang Liu
,
Martin Pinzger
,
Stefan Rass
· 1 min read
PentestGPT Architecture and Workflow
Abstract
Penetration testing, a crucial industrial practice for ensuring system security, has traditionally resisted automation due to the extensive expertise required by human professionals. Large Language Models (LLMs) have shown significant advancements in various domains, suggesting their potential to revolutionize industries. This work establishes a comprehensive benchmark using real-world penetration testing targets and explores the capabilities of LLMs in this domain. We introduce PentestGPT, an LLM-empowered automatic penetration testing tool designed with three self-interacting modules to address individual sub-tasks of penetration testing and mitigate context loss challenges.
Type
Publication
33rd USENIX Security Symposium (USENIX Security 24)

This work introduces PentestGPT, a groundbreaking approach to automated penetration testing that harnesses the power of Large Language Models. The tool addresses the long-standing challenge of automating security testing by leveraging LLMs’ extensive domain knowledge and reasoning capabilities.

Key Features:

  • Three-Module Architecture: Reasoning, Generation, and Parsing modules that work together to emulate human penetration testing workflows
  • Real-World Evaluation: Comprehensive benchmark using actual penetration testing targets and CTF challenges
  • Significant Performance Gains: 228.6% improvement in task completion rates compared to baseline GPT-3.5 model
  • Community Impact: Over 6,500 GitHub stars demonstrating strong industry adoption

Technical Innovation: PentestGPT addresses critical challenges in LLM-based security testing, including context loss and task-specific reasoning. The framework systematically breaks down complex penetration testing scenarios into manageable sub-tasks, enabling more effective automated security assessments.

Open Source Impact: The tool has been successfully deployed in real-world penetration testing scenarios and has fostered an active community of security professionals and researchers, validating its practical value in both academic and industrial contexts.