A Comprehensive Study of Jailbreak Attack versus Defense for Large Language Models

Feb 20, 2024·

Zihao Xu

Yi Liu

Gelei Deng

Yuekang Li

Stjepan Picek

· 1 min read

PDF arXiv

Abstract

Jailbreak attacks on Large Language Models have become a critical security concern. This comprehensive study systematically analyzes both attack and defense techniques, providing a thorough evaluation of existing methods and identifying key challenges and opportunities for improving LLM security.

Type

Preprint

Publication

arXiv preprint arXiv:2402.13457

This work provides a comprehensive study of jailbreak attacks and defenses for Large Language Models, establishing benchmarks and providing insights for the development of more robust LLM security mechanisms.

Last updated on Feb 20, 2024

Large Language Models AI Security Jailbreak Attacks AI Safety

Authors

Gelei Deng

← MASTERKEY: Automated Jailbreaking of Large Language Model Chatbots Feb 26, 2024

PANDORA: Jailbreak GPTs by Retrieval Augmented Generation Poisoning Feb 1, 2024 →