Our commitment to community safety
Our commitment to community safety Mass shootings, threats against public officials, bombing attempts, and attacks on communities and individuals are an unacceptable and grave reality in today’s world. These incidents are a reminder of how real the threat of violence is—and how quickly violent intent can move from words to action. People may also bring these moments and feelings into ChatGPT. They may ask questions about the news, try to understand what happened, express fear or anger, or talk about violence in ways that are fictional, historical, political, personal, or potentially dangerous. We work to train ChatGPT to recognize the difference—and to draw lines when a conversation starts to move toward threats, potential harm to others, or real-world planning. We’re sharing what we do to minimize uses of our services in furtherance of violence or other harm: how our models are trained to respond safely, how our systems detect potential risk of harm, and what actions we take when someone violates our policies. We are constantly improving the steps we take to help protect people and communities, guided by input from psychologists, psychiatrists, civil liberties and law enforcement experts, and others who help us navigate difficult decisions around safety, privacy, and democratized access. Our Model Spec(opens in a new window) lays out our long-standing principles for how we want our models to behave: maximizing helpfulness and user freedom while minimizing the risk of harm through sensible defaults. We work to train our models to refuse requests for instructions, tactics, or planning that could meaningfully enable violence. At the same time, people may ask neutral questions about violence for factual, historical, educational, or preventive reasons, and we aim to allow those discussions while maintaining clear safety boundaries—for example, by omitting detailed, operational instructions that could facilitate harm. The line between benign and harmful uses can be subtle, so we continually refine our approach and work with experts to help distinguish between safe, bounded responses and actionable steps for carrying out violence or other real-world harm. As part of this ongoing work, we’ve continued expanding our safeguards to help ChatGPT better recognize subtle signs of risk of harm across different contexts. Some safety risks only become clear over time: a single message may seem harmless on its own, but a broader pattern within a long conversation—or across conversations—can suggest something more concerning. Building on years of work in model training, evaluations and red teaming, and ongoing expert input, we have strengthened how ChatGPT recognizes subtle warning signs across long, high-stakes conversations and carefully responds. We’ll share more about this work in the coming weeks. Our safety work also extends to situations where users may be in distress or at risk of self-harm. In these moments, our goal is to avoid facilitating harmful acts, and also to help de-escalate the situation and guide people to real-world support. ChatGPT surfaces localized crisis resources, encourages people to reach out to mental health professionals or trusted loved ones, and in the most serious cases directs people…

