A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models
Abstract
Jailbreak attacks on Large Language Models have become a critical security concern. This comprehensive study systematically analyzes both attack and defense techniques, providing a thorough evaluation of existing methods and identifying key challenges and opportunities for improving LLM security.
Type
Publication
arXiv preprint arXiv:2402.13457
This work provides a comprehensive study of jailbreak attacks and defenses for Large Language Models, establishing benchmarks and providing insights for the development of more robust LLM security mechanisms.