PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

Aug 14, 2024·

Gelei Deng

Yi Liu

Víctor Mayoral-Vilches

Peng Liu

Yuekang Li

Yuan Xu

Tianwei Zhang

Yang Liu

Martin Pinzger

Stefan Rass

· 1 min read

PDF Code USENIX arXiv

PentestGPT Architecture and Workflow

Abstract

Penetration testing, a crucial industrial practice for ensuring system security, has traditionally resisted automation due to the extensive expertise required by human professionals. Large Language Models (LLMs) have shown significant advancements in various domains, suggesting their potential to revolutionize industries. This work establishes a comprehensive benchmark using real-world penetration testing targets and explores the capabilities of LLMs in this domain. We introduce PentestGPT, an LLM-empowered automatic penetration testing tool designed with three self-interacting modules to address individual sub-tasks of penetration testing and mitigate context loss challenges.

Type

Conference paper

Publication

33rd USENIX Security Symposium (USENIX Security 24)

This work introduces PentestGPT, a groundbreaking approach to automated penetration testing that harnesses the power of Large Language Models. The tool addresses the long-standing challenge of automating security testing by leveraging LLMs’ extensive domain knowledge and reasoning capabilities.

Key Features:

Three-Module Architecture: Reasoning, Generation, and Parsing modules that work together to emulate human penetration testing workflows
Real-World Evaluation: Comprehensive benchmark using actual penetration testing targets and CTF challenges
Significant Performance Gains: 228.6% improvement in task completion rates compared to baseline GPT-3.5 model
Community Impact: Over 6,500 GitHub stars demonstrating strong industry adoption

Technical Innovation: PentestGPT addresses critical challenges in LLM-based security testing, including context loss and task-specific reasoning. The framework systematically breaks down complex penetration testing scenarios into manageable sub-tasks, enabling more effective automated security assessments.

Open Source Impact: The tool has been successfully deployed in real-world penetration testing scenarios and has fostered an active community of security professionals and researchers, validating its practical value in both academic and industrial contexts.

Last updated on Aug 14, 2024