PersonaTeaming: Exploring How Introducing Personas Can Enhance Automated AI Purple-Teaming

This paper was accepted on the Workshop on Regulatable ML (ReML) at NeurIPS 2025.

Current developments in AI governance and security analysis have known as for red-teaming strategies that may successfully floor potential dangers posed by AI fashions. Many of those calls have emphasised how the identities and backgrounds of red-teamers can form their red-teaming methods, and thus the sorts of dangers they’re prone to uncover. Whereas automated red-teaming approaches promise to enrich human red-teaming by enabling larger-scale exploration of mannequin conduct, present approaches don’t take into account the position of identification. As an preliminary step in direction of incorporating individuals’s background and identities in automated red-teaming, we develop and consider a novel technique, PersonaTeaming, that introduces personas within the adversarial immediate technology course of to discover a wider spectrum of adversarial methods. Specifically, we first introduce a technique for mutating prompts based mostly on both “red-teaming professional” personas or “common AI consumer” personas. We then develop a dynamic persona-generating algorithm that robotically generates varied persona varieties adaptive to completely different seed prompts. As well as, we develop a set of latest metrics to explicitly measure the “mutation distance” to enrich current range measurements of adversarial prompts. Our experiments present promising enhancements (as much as 144.1%) within the assault success charges of adversarial prompts by persona mutation, whereas sustaining immediate range, in comparison with RainbowPlus, a state-of-the-art automated red-teaming technique. We talk about the strengths and limitations of various persona varieties and mutation strategies, shedding mild on future alternatives to discover complementarities between automated and human red-teaming approaches.

† Carnegie Mellon College
‡ Impartial Researcher
** Work carried out whereas at Apple

Main Menu

What's Hot

Reworking enterprise operations: 4 high-impact use circumstances with Amazon Nova

Your information to Day 2 of RoboBusiness 2025

Night Honey Chat: My Unfiltered Ideas

PersonaTeaming: Exploring How Introducing Personas Can Enhance Automated AI Purple-Teaming

Reworking enterprise operations: 4 high-impact use circumstances with Amazon Nova

Reinvent Buyer Engagement with Dynamics 365: Flip Insights into Motion

From Habits to Instruments – O’Reilly

Evaluating the Finest AI Video Mills for Social Media

Utilizing AI To Repair The Innovation Drawback: The Three Step Resolution

Midjourney V7: Quicker, smarter, extra reasonable

Meta resumes AI coaching utilizing EU person knowledge

Reworking enterprise operations: 4 high-impact use circumstances with Amazon Nova

Your information to Day 2 of RoboBusiness 2025

Night Honey Chat: My Unfiltered Ideas

Coming AI rules have IT leaders anxious about hefty compliance fines

Main Menu

Subscribe to Updates

What's Hot

PersonaTeaming: Exploring How Introducing Personas Can Enhance Automated AI Purple-Teaming

Related Posts