What Makes a Good LLM Agent for Real-world Penetration Testing?

Feb 19, 2026ยท
Gelei Deng
Gelei Deng
,
Yi Liu
,
Yuekang Li
,
Ruozhao Yang
,
Xiaofei Xie
,
Jie Zhang
,
Han Qiu
,
Tianwei Zhang
ยท 1 min read
Abstract
This work analyzes LLM-based penetration testing agents, identifies distinct engineering and planning failure modes, and introduces Excalibur, a difficulty-aware penetration testing agent that couples typed tooling, retrieval-augmented knowledge, and evidence-guided attack tree search.
Type
Publication
arXiv preprint arXiv:2602.17622

This paper examines why LLM penetration testing systems succeed or fail in real-world settings, then proposes Excalibur to improve task selection and attack-chain planning through difficulty-aware reasoning.