Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study

May 23, 2023ยท
Yi Liu
Gelei Deng
Gelei Deng
,
Zhengzi Xu
,
Yuekang Li
,
Yaowen Zheng
,
Ying Zhang
,
Lida Zhao
,
Tianwei Zhang
,
Yang Liu
ยท 1 min read
Abstract
Large Language Models (LLMs) have revolutionized natural language processing, but their safety mechanisms can be circumvented through carefully crafted prompts. This empirical study systematically investigates jailbreaking techniques against ChatGPT, providing a comprehensive analysis of prompt engineering methods that can bypass content moderation and safety filters.
Type
Publication
arXiv preprint arXiv:2305.13860

This work presents the first comprehensive empirical study on jailbreaking ChatGPT through prompt engineering, identifying key vulnerability patterns and providing insights for improving LLM safety mechanisms.