Safety Snowball Agent
Dec 2, 2025
ยท
1 min read
Safety Snowball Agent is an agent-based framework for evaluating how safe visual inputs can combine into unsafe behavior in large vision-language models.
The framework accompanies the NeurIPS 2025 paper “Safe + Safe = Unsafe?” and probes a multimodal jailbreak mechanism that differs from traditional adversarial-image attacks.