MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots
MASTERKEY Framework Architecture
This work presents MASTERKEY, a systematic approach to understanding and exploiting vulnerabilities in Large Language Model chatbots. The framework introduces novel methodologies for automated jailbreak attack generation and provides comprehensive analysis of existing defense mechanisms.
Key Contributions:
- Novel time-based attack strategy inspired by SQL injection techniques
- Automated jailbreak prompt generation achieving 21.58% success rate
- Comprehensive evaluation across mainstream chatbots (ChatGPT, Bard, Bing Chat, Ernie)
- Systematic analysis of defense mechanisms in commercial LLM services
Impact: This research has informed major service providers about critical vulnerabilities and contributed to strengthening LLM security measures across the industry.