Gelei Deng

☕️

Gelei Deng

About Me

My research focuses on AI safety and AI security. I am particularly interested in leveraging AI and automated systems to attack AI and cyber systems autonomously, enabling scalable and intelligent security testing. I received my PhD from Nanyang Technological University, advised by Prof. Tianwei Zhang and Prof. Yang Liu.

Download CV

Interests

AI Security and Safety
Large Language Models
Penetration Testing
Blockchain Security
System Security

Education

PhD in Computer Science
Nanyang Technological University
B.E. Electrical Engineering
Singapore University of Technology and Design

Featured Research

Large Language Models

RSafe: Incentivizing Proactive Reasoning to Build Robust and Adaptive LLM Safeguards

NeurIPS 2025 work on adaptive reasoning-based safeguards for robust LLM safety moderation.

Dec 2, 2025

Large Vision-Language Models

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

NeurIPS 2025 work showing how safe images can combine into multimodal jailbreaks through the Safety Snowball effect.

Dec 2, 2025

Large Language Models

Oedipus: LLM-enchanced Reasoning CAPTCHA Solver

An LLM-enhanced framework demonstrating vulnerabilities in reasoning-based CAPTCHA systems through AI-powered solving.

Oct 1, 2025

Large Language Models

PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

An LLM-empowered automated penetration testing framework that leverages domain knowledge inherent in LLMs, achieving 228.6% task completion improvement over baseline GPT models.

Aug 14, 2024

Large Language Models

MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots

A comprehensive framework for automated jailbreaking of Large Language Model chatbots, featuring novel attack methodologies and systematic analysis of defense mechanisms.

Feb 26, 2024

Large Language Models

PANDORA: Jailbreak GPTs by Retrieval Augmented Generation Poisoning

Novel attack framework exploiting RAG mechanisms to jailbreak LLMs through retrieval database poisoning. Distinguished Paper Award winner.

Feb 1, 2024

Selected Projects

Research artifacts, open-source systems, and security testing frameworks from my recent work.

Large Language Models

Excalibur

Excalibur is a difficulty-aware LLM agent design for real-world penetration testing. It couples typed tooling, retrieval-augmented security knowledge, task difficulty assessment, and evidence-guided attack tree search to reduce planning failures in multi-step penetration testing tasks.

Feb 19, 2026

Large Vision-Language Models

Safety Snowball Agent

Safety Snowball Agent is an agent-based framework for evaluating how safe visual inputs can combine into unsafe behavior in large vision-language models. The framework accompanies the NeurIPS 2025 paper “Safe + Safe = Unsafe?” and probes a multimodal jailbreak mechanism that differs from traditional adversarial-image attacks.

Dec 2, 2025

Large Language Models

MASTERKEY

MASTERKEY is a research framework for automated jailbreak attack generation and defense evaluation for LLM chatbots. The framework supports systematic analysis of jailbreak strategies across commercial chatbot systems and was published at NDSS 2024.

Feb 26, 2024

Large Language Models

PANDORA

PANDORA studies jailbreak attacks against retrieval-augmented generation systems through retrieval database poisoning. The work received the Distinguished Paper Award at AISCC 2024 and highlights a practical attack surface in RAG-enhanced LLM deployments.

Feb 1, 2024

API Security

NAUTILUS

NAUTILUS is an automated RESTful API vulnerability detection framework published at USENIX Security 2023. It combines API specification analysis with dynamic testing to uncover security flaws in modern web services.

Aug 9, 2023

Large Language Models

PentestGPT

An LLM-empowered automatic penetration testing framework with 14k+ GitHub stars and 2.4k+ forks. PentestGPT is designed to automate penetration testing by leveraging the domain knowledge inherent in Large Language Models. It features a three-module architecture (Reasoning, Generation, and Parsing) that emulates human penetration testing workflows. Key Features: Multi-module agent design for reasoning, generation, and parsing Integration with multiple LLM backends and real-world security workflows Evaluation on CTF challenges and practical penetration testing targets 228.6% task completion improvement over baseline GPT models Recognition: Distinguished Artifact Award at USENIX Security 2024 Widely used open-source security research artifact with active community adoption

Aug 1, 2023

Recent Publications

Gelei Deng, Yi Liu, Yuekang Li, Ruozhao Yang, Xiaofei Xie, Jie Zhang, Han Qiu, Tianwei Zhang (2026). What Makes a Good LLM Agent for Real-world Penetration Testing?. arXiv 2026.

PDF Project DOI arXiv

Yi Liu, Weizhe Wang, Ruitao Feng, Yao Zhang, Guangquan Xu, Gelei Deng, Yuekang Li, Leo Zhang (2026). Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale. arXiv 2026.

PDF DOI arXiv

Jingnan Zheng, Xiangtian Ji, Yijun Lu, Chenhang Cui, Weixiang Zhao, Gelei Deng, Zhenkai Liang, An Zhang, Tat-Seng Chua (2025). RSafe: Incentivizing Proactive Reasoning to Build Robust and Adaptive LLM Safeguards. NeurIPS 2025.

PDF DOI arXiv

Chenhang Cui, Gelei Deng, An Zhang, Jingnan Zheng, Yicong Li, Lianli Gao, Tianwei Zhang, Tat-Seng Chua (2025). Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models. NeurIPS 2025.

PDF Code Project DOI arXiv Code

Yuan Xu, Gelei Deng, Guanlin Li, Xingshuo Han, Shangwei Guo, Tianwei Zhang (2025). Controllable Spoofing Attacks on Visual SLAM in Robotic Vehicles. ACSAC 2025.

ACSAC

Cheng Wang, Gelei Deng, Xianglin Yang, Han Qiu, Tianwei Zhang (2025). When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models. EMNLP 2025.

PDF EMNLP arXiv

Gelei Deng, Haoran Ou, Yi Liu, Jie Zhang, Tianwei Zhang, Yang Liu (2025). Oedipus: LLM-enchanced Reasoning CAPTCHA Solver. CCS 2025.

PDF arXiv PDF

Yuan Xu, Gelei Deng, Tianwei Zhang (2025). Detecting Perception-Based Attacks using Visual Odometry: Inconsistency Modeling and Checking on Robotic States. ICRA 2025.

Ziqi Ding, Gelei Deng, Yi Liu, Junchen Ding, Jieshan Chen, Yulei Sui, Yuekang Li (2025). IllusionCAPTCHA: A CAPTCHA based on Visual Illusion. WWW 2025.

PDF DOI arXiv

Weisong Sun, Yiming Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, Zhenyu Chen (2025). Source Code Summarization in the Era of Large Language Models. ICSE 2025.

PDF arXiv

Yi Liu, Junzhe Yu, Huijia Sun, Ling Shi, Gelei Deng, Yuqi Chen, Yang Liu (2024). Efficient Detection of Toxic Prompts in Large Language Models. ASE 2024.

PDF DOI arXiv

Kunsheng Tang, Wenbo Zhou, Jie Zhang, Aishan Liu, Gelei Deng, Shuai Li, Peigui Qi, Weiming Zhang, Tianwei Zhang, Nenghai Yu (2024). GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models. CCS 2024.

PDF Code DOI arXiv Code

See all publications