Researcher, Automated Red Teaming

about 2 months ago

San Francisco, CA, USA

Mid Level / Senior

Base Salary

$295k - $445k/yr

Responsibilities

Lead the Automated Red Teaming effort to discover failure modes in AI models.
Build scalable, research-driven systems for continuous risk assessment.
Collaborate with vertical risk teams to define threat models and prioritize targets.
Work with the Classifiers team to convert discovered attacks into training data.
Ensure ART outputs are operationally useful for product and engineering stakeholders.
Translate findings into actionable improvements and communicate effectively with various teams.

Strong motivation towards AI safety and reducing catastrophic risks.
Experience in breaking systems and identifying high-severity failure modes.
Hands-on experience with LLMs and agents, including multi-turn behaviors.
Ability to build scalable automation and turn ideas into continuous pipelines.
Solid software engineering fundamentals, including data structures and algorithms.
Experience in adversarial ML, security research, or large-scale evaluation infrastructure is a plus.