Safety Snowball Agent

Dec 2, 2025 · 1 min read

Safety Snowball Agent is an agent-based framework for evaluating how safe visual inputs can combine into unsafe behavior in large vision-language models.

The framework accompanies the NeurIPS 2025 paper “Safe + Safe = Unsafe?” and probes a multimodal jailbreak mechanism that differs from traditional adversarial-image attacks.