Publications

(2026). What Makes a Good LLM Agent for Real-world Penetration Testing?. arXiv 2026.
(2026). Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale. arXiv 2026.
(2025). Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models. NeurIPS 2025.
(2025). RSafe: Incentivizing Proactive Reasoning to Build Robust and Adaptive LLM Safeguards. NeurIPS 2025.
(2025). Controllable Spoofing Attacks on Visual SLAM in Robotic Vehicles. ACSAC 2025.
(2025). When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models. EMNLP 2025.
(2025). Oedipus: LLM-enchanced Reasoning CAPTCHA Solver. CCS 2025.
(2025). IllusionCAPTCHA: A CAPTCHA based on Visual Illusion. WWW 2025.
(2025). Source Code Summarization in the Era of Large Language Models. ICSE 2025.