SafetyPairs: Isolating Security Vital Picture Options with Counterfactual Picture Technology

This paper was accepted on the Principled Design for Reliable AI — Interpretability, Robustness, and Security throughout Modalities Workshop at ICLR 2026.

What precisely makes a selected picture unsafe? Systematically differentiating between benign and problematic pictures is a difficult downside, as delicate adjustments to a picture, equivalent to an insulting gesture or image, can drastically alter its security implications. Nevertheless, present picture security datasets are coarse and ambiguous, providing solely broad security labels with out isolating the precise options that drive these variations. We introduce SafetyPairs, a scalable framework for producing counterfactual pairs of pictures, that differ solely within the options related to the given security coverage, thus flipping their security label. By leveraging picture modifying fashions, we make focused adjustments to photographs that alter their security labels whereas leaving safety-irrelevant particulars unchanged. Utilizing SafetyPairs, we assemble a brand new security benchmark, which serves as a robust supply of analysis knowledge that highlights weaknesses in vision-language fashions’ talents to differentiate between subtly completely different pictures. Past analysis, we discover our pipeline serves as an efficient knowledge augmentation technique that improves the pattern effectivity of coaching light-weight guard fashions. We launch a benchmark containing over 3,020 SafetyPair pictures spanning a various taxonomy of 9 security classes, offering the primary systematic useful resource for finding out fine-grained picture security distinctions.

† Georgia Institute of Know-how, USA
** Work performed whereas at Apple
‡ Equal senior authorship

Main Menu

What's Hot

DDoS-Angriffe haben sich verdoppelt | CSO On-line

Pentagon’s ‘Try and Cripple’ Anthropic Is Troubling, Choose Says

5 Indicators You Work For A Actually Nice Chief

SafetyPairs: Isolating Security Vital Picture Options with Counterfactual Picture Technology

Accelerating customized entity recognition with Claude software use in Amazon Bedrock

Getting Began with Nanobot: Construct Your First AI Agent

7 Steps to Mastering Reminiscence in Agentic AI Techniques

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

DDoS-Angriffe haben sich verdoppelt | CSO On-line

Pentagon’s ‘Try and Cripple’ Anthropic Is Troubling, Choose Says

5 Indicators You Work For A Actually Nice Chief

SafetyPairs: Isolating Security Vital Picture Options with Counterfactual Picture Technology

Main Menu

Subscribe to Updates

What's Hot

SafetyPairs: Isolating Security Vital Picture Options with Counterfactual Picture Technology

Related Posts