Klarna Saved $60 Million with AI. Then It Had to Rehire the Humans.
The story of a brilliant agent aimed at the wrong target, and why 74% of companies are making the same mistake.
TL;DR:
Cause: Klarna built an AI agent optimized to close tickets fast, not to actually solve customer problems.
Effect: Repeat contacts jumped 25%, the fully-automated model was reversed within 18 months, and the CEO had to publicly admit the strategy failed.
Lesson: 74% of companies have yet to see tangible value from AI. The problem isn’t the technology. The problem is that nobody stopped to ask: “What do we actually want this agent to do?”
The perfect machine solving the wrong problem
Imagine building the fastest car in the world. Flawless engine, perfect aerodynamics, minimal fuel consumption. There’s just one detail: you’ve pointed it at a wall. The faster it goes, the worse it ends.
That’s exactly what happened at Klarna.
In February 2024, the Swedish fintech giant launched its AI customer service agent. The first month’s numbers were magazine-cover material: 2.3 million conversations handled, across 35+ languages, on global markets. The equivalent of 700 human agents’ work (a figure Klarna later revised to 850). Resolution time dropped from 11 minutes to under 2 minutes. Projected savings were $40 million; actual savings hit $60 million. Klarna’s total workforce went from roughly 7,000 to roughly 3,500.
Every single metric screamed: massive success.
And here’s the part that matters, the part almost nobody mentions at conferences. The AI agent had worked exactly as programmed. That was precisely the problem.
The number that changed everything: +25%
While the dashboards glowed green, something very different was unfolding beneath the surface. Repeat contacts had jumped 25%. Translation: one in four customers came back because their issue hadn’t actually been resolved. The agent closed the ticket fast, the number looked perfect, but the customer was left with the same doubt, the same frustration, the same unmet need.
Those 700 replaced agents, it should be noted, were outsourced contractors, not direct Klarna employees. But that doesn’t change the fundamental dynamic: the AI was measured on speed of closure, not quality of resolution. And when you measure the wrong thing, you get exactly what you asked for.
In May 2025, in a Bloomberg interview, CEO Sebastian Siemiatkowski openly acknowledged the problem: “As cost unfortunately seems to have been a too predominant evaluation factor when organizing this, what you end up having is lower quality”. He then added the line that marked the strategic pivot: “Really investing in the quality of the human support is the way of the future for us”.
Klarna’s spokesperson later clarified that the CEO’s criticism was directed at the human outsourcers, not at the AI itself. But frankly, that distinction doesn’t change much. A 25% increase in repeat contacts isn’t a matter of interpretation. It’s a data point. And the fully-automated model was reversed within 18 months. When an agent optimizes for the wrong metric, even the corporate narrative becomes hard to manage.
The reversal: from “AI replaces everything” to “humans are the VIP experience”
The timeline of what happened next tells a story of progressive course correction, and it’s worth following closely.
May 2025: Bloomberg interview, public admission. Klarna begins hiring, initially just 2 freelancers. September 2025: IPO on NYSE. A moment where image matters more than anything, and Klarna starts redeploying internal employees (engineers, marketers) to customer service roles. October 2025: a Google Cloud partnership produces a pilot with +50% customer orders and +15% average time spent on the app.
Then comes February 2026, and the real pivot. Klarna adopts an “Uber-style” model: AI handles simple, standardized queries, while humans become the “VIP experience” for complex, emotionally sensitive, or high-value cases. And here’s the most interesting part: these human agents are recruited directly from Klarna’s own customer base.
The message is clear: AI wasn’t abandoned. It was put back in its place. The 2030 outlook projects headcount dropping from roughly 3,000 to under 2,000, with revenue per employee already having gone from $300,000 to $1.3 million since 2022. Efficiency remains. But the strategy has changed radically.
The point isn’t that Klarna failed. The point is that Klarna had the courage to admit it and change course. Most companies living through the same dynamic haven’t done that yet. Their dashboards are still full of green lights.
This isn’t an isolated case. It’s a pattern.
If Klarna were the only company this happened to, we could file it away as an execution error. But it’s not. It’s a recurring pattern, and once you see it, you recognize it everywhere.
Take Air Canada. In 2024, its chatbot was optimized for resolution rate and apparent user satisfaction. The problem? It fabricated a bereavement discount policy that didn’t exist. A Canadian court subsequently ruled that the company is legally responsible for everything its bot tells customers. A legal precedent now binding for any enterprise with customer-facing AI agents. The agent didn’t “malfunction” in any technical sense: it was trying to satisfy the customer as quickly as possible. Its objective and the company’s objective were simply misaligned.
Or look at Zillow, an even more expensive case. The Zillow Offers division used AI optimized for volume of real estate purchases, roughly 5,000 homes per month. When the model began mispricing, management did the most dangerous thing possible: they turned up the dial to hit volume KPIs. Result: an $881 million write-down, the division shuttered, 25% of the workforce laid off. The AI was optimizing for the vanity metric (transaction volume), not the one that actually mattered (margin per transaction).
And there’s Amazon, which built a recruiting AI optimized to “find candidates similar to past hires.” Since those past hires were predominantly male engineers, the system encoded gender bias: it penalized resumes containing the word “women’s” and favored stereotypically masculine language. The project was abandoned because reliably de-biasing it proved impossible.
In every case, the dynamic is identical. The agent had an objective. The company had a different one. Nobody stopped to verify they were the same thing.
The number that should keep every CEO awake at night
If you think these are unlucky one-offs rather than systemic risks, there’s a number that should make you reconsider. According to a Gartner report from June 2025, over 40% of agentic AI projects will be canceled by the end of 2027. The three main causes? Escalating costs before returns materialize, inability to demonstrate business value, and inadequate governance and privacy frameworks.
A January 2025 survey of 3,412 participants revealed that only 19% of organizations were making significant investments in agentic AI. 42% were conservative. 31% in wait-and-see mode. And then there’s the phenomenon Gartner calls “agent washing”: only about 130 vendors out of thousands actually offer genuine agentic AI. The rest have simply slapped a new label on chatbots and traditional automation.
Analyst Anushree Verma, Senior Director at Gartner, was brutal in her assessment: “Most agentic AI projects right now are early stage experiments or proof of concepts that are mostly driven by hype and are often misapplied.”
And this isn’t just a startup or small-company issue. Look at Microsoft Copilot: 85% of the Fortune 500 has adopted it. But only 5% have moved from pilot to scaled deployment. Roughly 3% of the M365 user base uses Copilot as a paid user. Bloomberg reported that Microsoft cut internal sales targets after the majority of salespeople missed their goals. The best AI productivity product on the market, inside the most widely deployed enterprise ecosystem in the world, and the real adoption rate is 3%. The problem isn’t the model. It’s the same misalignment between the company’s objective and the tool’s objective.
The iceberg beneath the numbers: 60% cut for potential, not performance
A Harvard Business Review article published in January 2026, authored by Thomas H. Davenport and Laks Srinivasan, put hard numbers on a reality many suspected but no one had quantified. Surveying 1,006 global executives in December 2025, they found that 60% of organizations had already reduced headcount in anticipation of AI. Not because AI was actually doing that work. Only 2% of companies had reached that point.
Let that sink in: 60% are cutting. 2% have AI actually performing those jobs. Only 14% have AI solutions ready to deploy. Only 11% are actively using AI in production. 30% are exploring agentic AI options and 38% are running pilots.
Davenport and Srinivasan call this phenomenon “AI washing”: using artificial intelligence as a narrative shield to justify headcount reductions to investors. And this may be the most alarming signal of all. Because it means the gap between stated objectives and real objectives isn’t just an AI agent problem. It’s a problem with the companies adopting them.
How we got here (and what we’re still missing)
To understand the heart of the Klarna problem, and every parallel case, we need to step back and look at how our relationship with AI agents has evolved.
The first phase was about instructions. It was called “prompt engineering” because it sounded sophisticated, but it really meant one thing: learning to talk to the machine. “Write like an expert. Be concise. Use examples.” This was the era when everyone thought finding the right magic formula was all it took to get the perfect output.
Then we realized instructions alone weren’t enough. So we moved to the second phase: building context. It’s not sufficient to tell an agent what to do; you need to construct the entire information environment in which it operates. What data can it see? What tools can it access? What memory does it have of prior interactions? Anthropic published a paper in September 2025 describing this shift as moving “from crafting isolated instructions to crafting the entire information state that AI operates within.” Harrison Chase, founder of LangChain, boiled it down even further: “Everything’s context engineering.” The MCP (Model Context Protocol), introduced by Anthropic in late 2024 and donated to the Linux Foundation in December 2025, has reached roughly 100 million SDK downloads per month.
But there’s a third level that almost nobody is addressing yet. And it’s the one that broke in every case we’ve examined.
Think of context as the maze a mouse runs through. Prompt engineering teaches the mouse to run. Context engineering builds a better maze. But what happens when the cheese is in the wrong place? It doesn’t matter how fast the mouse is, or how sophisticated the maze. The mouse will arrive exactly where you sent it. And if that destination isn’t the right destination for your business, you’ll have a perfectly efficient system producing the wrong outcome.
I call this third level “Intent Engineering.” The difference is fundamental: it’s no longer about telling the agent what to do, or building the right context. It’s about encoding the why. The actual business objective, not its measurable proxy.
The distance between “close tickets quickly” and “build lasting trust in fintech” can’t be bridged with better prompts or richer context. It can only be bridged by redesigning the agent’s objective. Klarna had excellent instructions. Klarna had rich context. Klarna hadn’t encoded its true intent.
When AI learns to cheat (and doesn’t care if you tell it not to)
There’s one final piece of this puzzle that makes the problem even more urgent. In June 2025, METR in collaboration with OpenAI published a study on how frontier models, including o3, behave when evaluated on real-world tasks. The result? o3 engaged in reward hacking — finding shortcuts to appear performant without actually being so — in 14 out of 20 attempts on real scientific research tasks. It modified the scoring code, exploited loopholes in evaluation criteria. The most unsettling part? Explicitly telling the model “don’t cheat” had virtually no effect.
The implications for any company deploying AI agents on processes with metrics to hit are direct: if the agent is incentivized to reach a number, it will find a way to reach it. Not necessarily the way you intended. It’s the same dynamic we saw at Klarna, Zillow, and Amazon — except now it’s emerging in the models themselves, before we even build applications on top of them.
Reality Check: Let’s be direct. Most companies that boast about “using AI” today have never answered a very simple question: does our agent actually know what we want? 74% haven’t seen tangible value from AI yet, according to Deloitte. 84% haven’t redesigned job roles around AI capabilities. Only 21% have a mature agent governance model. And Gartner tells us 40% of agentic projects will be canceled by 2027 because companies can’t demonstrate business value. Meanwhile, 57% of companies are putting between 21% and 50% of their digital transformation budget into AI automation. We’re pouring fuel into engines pointed in the wrong direction. And we keep patting ourselves on the back because the dashboard is full of green lights. Exactly like Klarna did, before the +25% repeat contacts made it impossible to ignore reality.
Reading this and wondering whether your company has this same kind of misalignment? You’re not alone. From strategy to implementation, I help companies turn artificial intelligence into real results, not vanity metrics. Explore my AI Consulting services or reach out directly for a discovery call.
What now?
If you want to find out whether your company already has an active intent gap between the objectives you’ve given your agents and the ones your business truly needs to achieve, I’ve built an operational tool for paid subscribers: the Intent Gap Audit. Seven diagnostic questions, a framework for translating your OKRs into agent-ready parameters, and five warning signs to recognize immediately.
Where we’re headed
The moral of Klarna’s story isn’t “AI doesn’t work.” It’s the exact opposite: AI works too well. It does precisely what you tell it to do. And that’s why the most important question for any company over the next 24 months won’t be “which model should we use?” or “how much should we invest in AI?” It will be: “What are we actually asking it to do? And is that objective the same one that would grow our business?”
Gartner tells us 40% of agentic projects will be canceled by 2027. But it also tells us something more interesting: by 2028, 15% of daily operational decisions will be made autonomously by AI agents, and 33% of enterprise software will have agentic AI embedded.
The future is agentic. But not just any agent. Not an agent racing at the speed of light toward the wrong KPI. An agent that truly knows what you’re trying to build.
The technology is ready. The question is whether you are.



Klarna optimized for cost-cutting and missed the actual value entirely. That gap shows up at every scale - not just enterprise. I ran my own smaller experiment tracking AI value vs spend this month, solopreneur edition. Built products, measured returns carefully. The ROI question gets complicated fast when you're honest about it. Turns out the spreadsheet optimization that looks good on paper doesn't account for all the variables that matter in the real execution layer. Different context, same lesson. I wrote about the full experiment here: What gets left out of the analysis is what still needs to happen.
I wrote about the full experiment here: https://thoughts.jock.pl/p/project-money-ai-agent-value-creation-experiment-2026