New Anthropic Safety Updates Focus on Claude’s Well-Being

Claude Opus 4 and 4.1 now have the discrete ability to end “abusive” or “harmful” conversations in consumer chat interfaces. Anthropic says the feature was developed as part of its exploratory work on the protection and well-being of its AI models. The company also envisions broader safety uses, although it does point out that having a model defensively terminate a chat is an extreme measure, intended for use in rare cases. “We’re working to identify and implement low-cost interventions to mitigate risks to model welfare,” Anthropic explains, qualifying it is unsure “such welfare is possible.” Continue reading New Anthropic Safety Updates Focus on Claude’s Well-Being