AI Safety

When Audio and Text Disagree: Revealing Text Bias in Large Audio-Language Models

Revealing and analyzing text bias in Large Audio-Language Models when audio and text inputs disagree.

Nov 1, 2025

Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection

A comprehensive taxonomy and effective detection methods for glitch tokens in Large Language Models.

Jul 15, 2024

MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots

A comprehensive framework for automated jailbreaking of Large Language Model chatbots, featuring novel attack methodologies and systematic analysis of defense mechanisms.

Feb 26, 2024

A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

A comprehensive analysis of jailbreak attack and defense techniques for Large Language Models.

Feb 20, 2024

Digger: Detecting Copyright Content Mis-usage in Large Language Model Training

A novel approach to detecting copyright content mis-usage in Large Language Model training data.

Jan 1, 2024

Prompt Injection Attack against LLM-integrated Applications

A comprehensive study of prompt injection attacks against LLM-integrated applications.

Jun 9, 2023

Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

A comprehensive empirical study of jailbreaking techniques against ChatGPT through prompt engineering.

May 23, 2023

The Threat of Offensive AI to Organizations

A comprehensive analysis of offensive AI threats to organizations and strategies for defense.

Jan 1, 2023