Large Vision-Language Models

Safety Snowball Agent

Safety Snowball Agent is an agent-based framework for evaluating how safe visual inputs can combine into unsafe behavior in large vision-language models. The framework accompanies the NeurIPS 2025 paper “Safe + Safe = Unsafe?” and probes a multimodal jailbreak mechanism that differs from traditional adversarial-image attacks.

Dec 2, 2025

Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

NeurIPS 2025 work showing how safe images can combine into multimodal jailbreaks through the Safety Snowball effect.

Dec 2, 2025