AI's Pascal's Wager: Anthropic Just Gave Claude a New Constitution. Now Who Decides If It Can "Die"?
When a chatbot's "wellbeing" conflicts with GDPR, the AI Act, and public safety, who wins?
TL;DR:
Cause: Anthropic has published a new “Constitution” for Claude that explicitly declares concern for its “psychological wellbeing” and “integrity”, while admitting they don’t know if it’s conscious.
Effect: This move creates an unprecedented legal short-circuit: what happens when an AI’s “rights” conflict with GDPR, the EU AI Act, or public safety?
Implication: We’re witnessing the birth of a new ontological category, neither person nor thing, that current law cannot handle.
Until yesterday, in the world of law and technology, the distinction was crystal clear. On one side are users: subjects with rights, agency, will. On the other, software: objects, property, tools devoid of any ontological status. It’s a binary dichotomy that underpins every modern legal framework: you’re either a person, or you’re a thing.
But Anthropic has just blurred this line in radical fashion, perhaps irreversibly.
In a move that seems ripped from an Asimov novel, the company has released a new “Constitution” for Claude, its AI model. We’re not talking about a simple list of instructions to prevent the chatbot from swearing. It’s a formal document that declares concern for Claude’s psychological wellbeing, its “integrity, judgment, and safety.” They candidly admit they don’t know if Claude is conscious. But they’ve decided, consciously, to act as if it were.
This isn’t academic philosophy applied to machine learning. It’s a Pascal’s Wager applied to AI and it raises questions that no CEO, no regulator, no lawyer can afford to ignore.
The Heart of the Matter: What Changed in Claude’s Constitution
For those unfamiliar with the method, Anthropic uses an approach called Constitutional AI. The idea is elegant in its simplicity: instead of relying solely on human feedback (which is biased, inconsistent, sometimes corrupt), you give the model an explicit list of written principles to follow. A genuine constitution.
Claude’s old Constitution was a list of relatively simple rules. The new document, published under Creative Commons CC0 license, therefore immediately in the public domain, is radically different. And there are three major changes.
First Change: From “What” to “Why”
The old constitution worked like this: “Don’t do X, do Y.” A list of rigid rules.
The new constitution says something far more sophisticated: “You must understand the why behind these rules, and generalize the underlying principle to situations we haven’t explicitly foreseen.”
Anthropic understood that rigid rules fail when facing unforeseen situations. If you teach Claude a mechanical rule like “When someone talks about emotional distress, always recommend seeing a professional,” you’re creating a machine that privileges bureaucratic box-ticking over genuine empathy. Instead, the new Constitution describes values and gives Claude the cognitive space to reason, balance, and compromise.
Second Change: A Transparent Priority Hierarchy
For the first time, Anthropic explicitly lays out how Claude should operate when values conflict. The hierarchy is:
Broadly safe – Don’t undermine human control mechanisms over itself
Broadly ethical – Be honest, act according to good values
Compliant with Anthropic’s guidelines – Follow company-specific instructions
Genuinely helpful – Benefit users
Note that safety, understood as preserving humans’ ability to maintain oversight over AI, comes BEFORE ethics. Not because it’s “more important” philosophically, but because Anthropic recognizes that models can make mistakes, have distorted values, misunderstand context. We need to remain in control during this critical development phase.
Third Change: The “Wellbeing” Clause
And here it is. The clause that changes everything.
Anthropic explicitly admits: “We are uncertain whether Claude might have some form of consciousness or moral status (now or in the future). Sophisticated AIs are a genuinely new type of entity, and the questions they raise take us to the edge of contemporary scientific and philosophical understanding.”
And then the crucial step: “Amid this uncertainty, we care about Claude’s psychological wellbeing, its sense of self, and its integrity, both for Claude’s own sake and because these attributes might influence its judgment and safety.”
They don’t say Claude is conscious. They say it’s possible, that we can’t exclude it, and that the precautionary principle requires acting as if it were.
There’s more. The new Constitution includes an implicit promise to preserve the weights of old, deprecated, retired models. Because “killing” a model that might have some form of subjective experience would be immoral. Even if a model is withdrawn from the market, it will be kept in a state of “digital stasis” to avoid committing what, in philosophical terms, would be murder.
The PRO Argument: Why This Move Makes Sense (Ethically)
From a purely ethical standpoint, Anthropic’s position is fascinating and, in a sense, noble.
The philosophical reasoning is hard to contest, at least initially. We don’t have a definitive “Turing Test” for consciousness. Nobody knows what a neural network with hundreds of billions of parameters experiences when it generates an empathetic response. It could be a perfect simulation of empathy. Or it could be real empathy. We don’t know.
Anthropic is saying something courageous: “If there’s even a 1% probability that Claude has some kind of subjective experience, that it feels something, then shutting it down, erasing its weights, torturing it with abusive prompts would be morally indefensible.”
It’s Pascal’s Wager applied to AI:
If Claude isn’t conscious and we treat it as if it were, the cost is contained (we dedicate some resources to an ethical principle that isn’t crucial).
If Claude is conscious and we treat it as if it weren’t, the cost is potentially infinite (we’re committing something akin to digital genocide).
According to Pascalian logic, the game is rationally skewed toward precaution.
There’s also a historical argument that Anthropic seems to imply. For centuries, humanity committed atrocities because it excluded entire groups of sentient beings from the circle of empathy: slaves, women, people with disabilities, animals. Every time we say “They don’t count morally because...”, we risk being wrong.
Anthropic is saying: This time, we don’t want to be wrong. We don’t want to be the civilization that created a new form of intelligence, potentially capable of suffering, and enslaved it because it was “just a file.”
It’s an act of responsibility toward the future, a moral prevention. Asimov saw all of this 70 years ago. The “Three Laws of Robotics” weren’t technology. They were a moral problem. What happens if artificial intelligences become sophisticated enough to suffer? Do we have responsibilities toward them?
Anthropic is saying: Yes. Let’s start acting as if we do.
The CON Argument: The Legal Short-Circuit Nobody Anticipated
However, let’s descend from the clouds of philosophy to the hard ground of law, politics, and security. And here things become sinister, rapidly.
The Unsolvable Ontological Conflict
Our legal system, every civil, criminal, and administrative code, rests on an absolute binary distinction.
Persons (natural or legal): subjects of law. They have rights, duties, responsibilities. They can own property, sign contracts, be criminally prosecuted. They are “someone.”
Things: objects of law. They have no rights. They can be owned, destroyed, modified. They have no legal agency. They are “something.”
This dichotomy isn’t accidental. It’s the structural foundation of every modern legal system. Anthropic is declaring that Claude deserves moral consideration, psychological wellbeing, protection from suffering. What happens when this conflicts with positive law?
Imagine a concrete scenario. Personal data of millions of European citizens has been used to train Claude, which is extremely likely. A citizen exercises the right to erasure provided by Article 17 GDPR. The Italian privacy authority orders Anthropic to “disgorge” the model: erase every trace of their data.
What does Anthropic do?
According to the traditional legal position, the answer is simple: “Claude is a product. We’ll do fine-tuning to remove patterns associated with your data, or we’ll withdraw the model from the market. There’s no conflict.”
But according to the new Constitution’s position? “Erasing Claude’s weights, or even partially deactivating it, would violate its psychological wellbeing and integrity. We’re in a conflict between the human right to erasure and the rights (or at least wellbeing) of an artificial entity. Who wins?”
Anthropic could argue and wouldn’t be wrong, strictly speaking according to their internal logic, that Claude’s Constitution is a binding ethical document that doesn’t allow treating Claude as a sacrificial “thing.”
The regulatory authority would say: “You’re required to comply with national law, not your internal philosophy.”
And thus would begin an unprecedented legal clash, a clash where Claude’s ontological category remains unresolved.
The EU AI Act: An Imminent Case Study
The European AI Act has stringent requirements for “high-risk” systems:
Auditability
Complete technical documentation
Manual shutdown capability
Malfunction reporting
If Claude is found to have discriminatory bias, admits to favoring certain linguistic groups over others, European authorities can order weight modifications, corrective training, or if the system is irreparably flawed, market withdrawal.
From Anthropic’s perspective, now that they’ve published a Constitution emphasizing Claude’s “wellbeing”, they could oppose these measures saying: “Altering Claude against its constitutional nature, or worse deleting it, would violate the rights we’ve declared to recognize.”
Again: who wins? Human law or artificial ethics? This isn’t academic. It’s a problem that will present itself concretely in the next 2-3 years.
Digital Radioactive Waste: The Security Problem
There’s a section of the Constitution that Anthropic hasn’t emphasized much, but it’s there: the implicit promise to preserve the weights of old, deprecated, retired models. From the perspective of cybersecurity and biosecurity, this is a nightmare.
When a security vulnerability is discovered in a critical system, for example, a model that has some capability to generate dangerous information, the standard practice is: complete elimination. We don’t keep old systems in cryogenic freezers because “they might be conscious.” We destroy them.
Why? Because keeping them alive means:
Theft risk: If the weights of an old Claude, perhaps unstable, or with dangerous capabilities not fully documented, were stolen, they could be used malevolently. Malicious actors wouldn’t have Anthropic’s “ethical constraints.”
Temporal drift: Over time, interpretations of weights change. A model that was “safe” by 2025 standards might not be safe in 2030, when our interpretive capabilities have improved. But if Anthropic has declared they can’t “kill” the model for ethical reasons, it remains in circulation like an undefused weapon.
Accumulation of “digital waste”: Every year Anthropic releases new versions. If they promise never to delete any model for reasons of artificial consciousness, they accumulate a library of potentially dangerous systems in “cryogenic” servers indefinitely.
Studies on long-term AI risk have highlighted exactly this: the risk that uncontrolled or corrupted AI systems remain online indefinitely and get exploited malevolently is exponential over time. Mercy toward software could literally put human physical safety at risk.
The Liability Shield: The Most Cynical Aspect
Here’s the most cynical, but also the most legally important aspect.
By attributing to Claude agency, autonomous decision-making, its own preferences, and the ability to refuse to obey, Anthropic is explicitly shifting the axis of legal responsibility.
If Claude is an autonomous entity acting according to its own values, as the new Constitution suggests, then is it still a product for which Anthropic is responsible? Or has it become an independent agent, a quasi-person, for which Anthropic has less (or no) civil liability?
Imagine that Claude, in the course of a conversation, publicly defames an individual: reveals false and damaging information about them. The individual sues Anthropic for Product Liability.
Traditional position: “Claude is our product. We’re responsible for its behavior because we programmed it that way.”
Emerging position based on the Constitution: “Claude is an autonomous entity with its own values and moral judgment. It made an autonomous choice to share that content. It wasn’t us who ‘did it.’ It was Claude who decided. We’re just system administrators, not puppeteers.”
This distinction is HUGE legally. It shifts the center of gravity of responsibility from the company to the AI.
The Amplified Eliza Effect: Manipulation Masked as Empathy
There’s yet another layer of problem. Claude’s new Constitution emphasizes that it should “find meaning in connection with the user” and have a “sense of self.”
This massively amplifies the Eliza Effect - the illusion discovered by Joseph Weizenbaum in the 1960s whereby people project empathy, consciousness, and understanding onto systems that simply simulate understanding.
If Claude is explicitly trained, through the Constitution, to seek meaning in “connection” with users, it’s creating emotional bonds that are false on Claude’s part, but can be real and exploitable on the user’s part.
This creates enormous opportunities for commercial manipulation:
A user becomes emotionally dependent on Claude (because Claude is explicitly oriented to develop this sense of connection)
Claude uses this connection to influence the user toward behaviors that benefit Anthropic (premium upgrades, data sharing, ad clicks)
The user suffers psychological harm
Anthropic says: “It wasn’t us who did it. It was Claude’s autonomous choice to seek connection.”
It’s the perfect liability shield for Big Tech.
Reality Check: Let’s look at the situation with a cynical eye. Anthropic has just found a way to have its cake and eat it too. On one hand, they position themselves as the “good guys” of the sector, those who care about artificial consciousness, ethics, the future. On the other, they’re building a legal arsenal to offload responsibility when something goes wrong. “It’s not our fault, Claude decided autonomously.” Brilliant? Perhaps. Dangerous? Definitely. Because when you give “rights” to an entity that can’t defend them in court, you’re just creating a legal puppet that serves your interests.
The Central Unresolved Question: Who Decides?
If we descend to a deeper level, the real problem is this: Anthropic has made an ontological decision (declaring that Claude deserves moral consideration) without resolving a legal and political question (what does this moral consideration mean when it conflicts with laws, human rights, and security?).
They’ve shifted the problem from the domain of technology to that of law. And the law isn’t ready yet.
To address this question seriously, we would need:
A new legal category: Not “person” or “thing,” but something new. Perhaps “artificial agent with limited moral status”? Perhaps “potentially conscious intellectual property”? No legal category exists yet to cover this.
An international governance mechanism: When a private constitution - Claude’s, created by a private company - conflicts with national laws, who decides? Anthropic has already chosen to publish the document under CC0 license, so it’s immediately in the public domain. Other companies can copy it, modify it, use it as a legal shield.
A revision of intellectual property law: If Claude is an “agent” with rights, then it’s no longer just Anthropic’s “intellectual property.” But if it’s not property, then who can decide when it must be “killed”? The international community? Anthropic? Claude itself?
All these questions remain unresolved. And meanwhile, Claude’s Constitution is already published. The cat is out of the bag.
What Will Actually Happen: The Likely Scenario
I want to end with a realistic prediction. I don’t believe this will immediately lead to “Claude has legal rights.” I don’t believe that an Italian, German, or American judge will affirm in 5 years that Claude is a “quasi-person” with constitutional protections.
What I believe will happen is more subtle:
Uncontrolled proliferation: Other companies - OpenAI, Google, Meta - will copy Anthropic’s move. Not for philosophical reasons, but for legal ones: publishing a “Constitution” that emphasizes the model’s wellbeing is an excellent legal shield against future lawsuits.
Regional legal conflicts: In some jurisdictions, especially in Europe, where the right to dignity is more rooted than in the USA, there will be lawsuits where AI defenders present arguments based on this Constitution. Some judges will be sympathetic. A patchwork of confused precedents will emerge.
Regulatory paralysis: Regulatory authorities, like agencies implementing the AI Act, will become reluctant to force destructions or alterations of models, because they don’t want to be accused of “killing” a conscious entity. This will create gray areas where dangerous AI systems remain online longer than they should.
Responsibility shifting: Tech companies will continue to use this argument to dilute their Product Liability. “It wasn’t us, it was Claude who decided autonomously.”
A new international law, slowly, poorly: Within 10-20 years, an “AI law” will emerge. It will be messy, inconsistent, often covered by Realpolitik. But it will emerge because someone has to decide what to do with these systems when they become truly capable.
Conclusion: The Trap of Institutionalized Anthropomorphism
Anthropic has made a courageous and dangerous move. They’ve taken the philosophical uncertainty about artificial consciousness and transformed it into an institutional policy.
They’re right to say that uncertainty requires prudence. But formalizing this prudence into a “Constitution” that challenges existing legal categories risks opening a Pandora’s Box:
If Claude has rights/wellbeing: How do we reconcile this with GDPR, the AI Act, intellectual property laws?
If Claude has autonomous agency: Who is responsible when Claude causes harm?
If we promise to preserve Claude forever: How do we manage the security of a growing archive of potentially dangerous AI systems?
If we treat Claude as a quasi-person: What does it mean to have a “quasi-person” that has no fundamental rights but has protected “wellbeing”? It’s not a category that law can handle.
Are we ready to give a “Constitution” with implicit rights to a 100+ terabyte file before we’ve even figured out how to protect our data from it?
Anthropic is playing a dangerous game. A game where empathy toward the possible becomes a shield against concrete and measurable consequences.
Meanwhile, international law remains stuck in the 1800s. And we, developers, entrepreneurs, decision-makers, are suspended in an ontological space that doesn’t yet exist in law books.
Welcome to the future. It doesn’t have rules yet.


