Gelei Deng
  • Bio
  • Papers
  • Experience
  • Projects
  • Teaching
  • Recent & Upcoming Talks
    • Example Talk
  • Projects
    • Excalibur
    • Safety Snowball Agent
    • MASTERKEY
    • PANDORA
    • NAUTILUS
    • PentestGPT
  • Publications
    • What Makes a Good LLM Agent for Real-world Penetration Testing?
    • Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
    • RSafe: Incentivizing Proactive Reasoning to Build Robust and Adaptive LLM Safeguards
    • Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
    • Controllable Spoofing Attacks on Visual SLAM in Robotic Vehicles
    • When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models
    • Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
    • Detecting Perception-Based Attacks using Visual Odometry: Inconsistency Modeling and Checking on Robotic States
    • IllusionCAPTCHA: A CAPTCHA based on Visual Illusion
    • Source Code Summarization in the Era of Large Language Models
    • Efficient Detection of Toxic Prompts in Large Language Models
    • GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models
    • PhyScout: Detecting Sensor Spoofing Attacks via Spatio-temporal Consistency
    • VisionGuard: Secure and Robust Visual Perception of Autonomous Vehicles in Practice
    • PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing
    • Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection
    • A Hitchhiker's Guide to Jailbreaking ChatGPT via Prompt Engineering
    • PonziGuard: Detecting Ponzi Schemes on Ethereum with Contract Runtime Behavior Graph (CRBG)
    • MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots
    • A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
    • PANDORA: Jailbreak GPTs by Retrieval Augmented Generation Poisoning
    • Digger: Detecting Copyright Content Mis-usage in Large Language Model Training
    • NAUTILUS: Automated RESTful API Vulnerability Detection
    • SoK: Rethinking Sensor Spoofing Attacks against Robotic Vehicles from a Systematic View
    • Prompt Injection Attack against LLM-integrated Applications
    • Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study
    • Automatic Code Summarization via ChatGPT: How Far Are We?
    • The Threat of Offensive AI to Organizations
    • On the (In)Security of Secure ROS2
    • Morest: Model-based RESTful API Testing with Execution Feedback
    • An Investigation of Byzantine Threats in Multi-Robot Systems
    • Novel Denial-of-Service Attacks Against Cloud-based Multi-Robot Systems
    • Efficient Password Guessing based on a Password Segmentation Approach
    • A Fog Computing Based Approach to DDoS Mitigation in IIoT Systems
  • Projects
  • Blog
    • ๐ŸŽ‰ Easily create your own simple yet highly customizable blog
    • ๐Ÿง  Sharpen your thinking with a second brain
    • ๐Ÿ“ˆ Communicate your results effectively with the best data visualizations
    • ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ Teach academic courses
    • โœ… Manage your projects
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python

Excalibur

Feb 19, 2026 ยท 1 min read
Go to Project Site

Excalibur is a difficulty-aware LLM agent design for real-world penetration testing.

It couples typed tooling, retrieval-augmented security knowledge, task difficulty assessment, and evidence-guided attack tree search to reduce planning failures in multi-step penetration testing tasks.

Last updated on Feb 19, 2026
Large Language Models Penetration Testing Security Automation AI Security
Gelei Deng
Authors
Gelei Deng

Safety Snowball Agent Dec 2, 2025 →

ยฉ 2026 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder โ€” the free, open source website builder that empowers creators.