The AI Architect

The AI Architect

The Most Powerful AI Model Ever Released. And Why You Should Probably Use a Cheaper One.

Fable 5 is here. Real production data from 50 trillion monthly tokens says you should use something else for most of your work.

Matija Vidmar's avatar
Matija Vidmar
Jun 10, 2026
∙ Paid

TL;DR

  • On June 9, 2026, Anthropic released Claude Fable 5, the most capable general-purpose model ever made available to the public, priced at $50 per million output tokens

  • Same week: Uber burned through its entire annual Claude Code budget by Q1 2026, Amazon shut down KiroRank over unproductive token consumption

  • Real production data from Vercel shows that the companies winning the AI game are not using the most powerful model. They are matching the right model to the right task. The difference shows up on the invoice.


Uber burned through its entire annual Claude Code budget by the end of Q1 2026. Not a footnote. Amazon shut down KiroRank, its internal software development tool, with a blunt diagnosis: unproductive tokenmaxxing. Tokens consumed without proportional output. That was the week of June 9.

This week: Anthropic released the most powerful general-purpose AI model ever made available to the public.

The paradox is immediate and worth keeping in mind for everything that follows. The company using the most expensive and most advanced AI does not automatically win. The company that knows which AI to use for what wins. These are two different games. Most businesses are still playing the first one while believing they are playing the second.

What Claude Fable 5 Is (and What Mythos Has to Do With It)

Fable 5 and Mythos 5 share the same underlying model. The naming is not accidental: “Fable” comes from the Latin fabula, the story, a direct relative of the Greek mythos. Two names for two versions of the same core.

The difference is in the safeguards. Fable 5 has active safety classifiers across three areas: offensive cybersecurity, biology and chemistry, and attempts to extract the model’s parameters to train competing models. When one of these classifiers triggers, the response is automatically handled by Claude Opus 4.8. The user is notified. The system is transparent.

Mythos 5 does not carry these restrictions, or has them disabled in certain areas. It is reserved for selected partners under Project Glasswing, a collaboration with the US government on cyber defense and biological research.

The numbers: one million token context, maximum 128,000 token output, knowledge current to January 2026. API pricing: $10 per million input tokens, $50 per million output tokens. Exactly double Opus 4.8. On subscription plans (Pro, Max, Team, and Enterprise), Fable 5 is included at no extra cost until June 22, 2026. From June 23, additional usage credits are required.

Simon Willison, one of the most reliable technical commentators in the field, wrote after his first full day of testing: “Something beastly. Slow, expensive, and it chewed through everything I threw at it without any problems.” The bill for that single day: $110.42 in tokens, against a $100 monthly subscription.

What They Are Not Telling You

The classifiers on cybersecurity and biology are public, declared, and transparent. They trigger, the system tells you, Opus 4.8 takes over. That is the full picture on that layer.

There is a second layer. Documented in the Fable 5 system card but not in the main announcement: queries related to AI model development, distributed training infrastructure, and ML accelerator design. Here the model does not fall back to Opus. It stays as Fable 5. But its effectiveness is quietly reduced, through prompt modification, steering vectors, or parameter-efficient fine-tuning. The user receives no notification.

Nathan Lambert, independent AI researcher and author of the Interconnects newsletter, wrote in his commentary on the system card: “An AI model that automatically becomes less intelligent without telling me is categorically misaligned AI.”

His point is not the scandal. It is the logic: if all safeguards take the same form, the policy is consistent. When one is visible and another is hidden, the second one has more to do with Anthropic’s competitive position than with industry safety.

This is not a reason to stop using Claude. It is a reason to remember that AI models are not neutral tools. They have embedded economic interests. That applies to every provider, not only Anthropic.

The Five Levels (and What They Actually Cost)

The Claude ecosystem now has five distinct rungs:

  • Mythos 5: Project Glasswing partners only

  • Fable 5: $50 per million output tokens

  • Opus 4.8: $25 per million output tokens

  • Sonnet 4.6: $15 per million output tokens

  • Haiku 4.5: $5 per million output tokens

For anyone working with the API, the output cost gap between Fable and Haiku is 10 to 1. Add DeepSeek V4 Flash at $0.28 per million output tokens, and the effective market range reaches nearly 180 to 1.

The question is not which model is most powerful. It is which level of power this specific task actually requires.

The AI Architect is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

How Companies Are Actually Using AI (The Data That Changes Everything)

Vercel AI Gateway routes tens of trillions of tokens per month between real applications and AI providers. Their May 2026 data is the only reliable picture of what happens when companies stop theorizing and start paying real bills.

DeepSeek V4 Flash went from under 1% to 17% of total tokens in a single month. Third provider by volume, ahead of OpenAI. Its share of spend: close to 1%.

Anthropic, in the same period, grew its token share from 26% to 32% and its spend share from 61% to 65%. In high-stakes use cases, it holds 70% to 80% of spend: AI app generation, back-office agents, coding agents.

In the software development agent segment, the contrast is even sharper: DeepSeek handles 49% of tokens and 4% of spend. Anthropic handles 28% of tokens and 70% of spend.

“DeepSeek holds 17% of tokens. 1% of spend. Anthropic holds 32% of tokens. 65% of spend. That is the AI market in 2026: the winners do not spend more. They spend smarter.”

The practical reading: companies are not using less AI. They are using expensive AI for work that matters and cheap AI for volume. Total spend grew 43% month over month.

From Tokenmaxxing to Orchestration

As I described in my analysis of the token-ranking phenomenon inside companies, there was a phase where companies measured AI value by tokens consumed. It was a trap: you were not measuring results, you were measuring usage.

The invoices arrived. Uber burned its annual Claude Code budget by Q1 2026. Amazon closed KiroRank after tokens consumed failed to produce proportional output. Business Insider called it unproductive tokenmaxxing.

Companies are now learning. Haiku for high-volume automation. Sonnet for daily production work. Opus for complex tasks requiring judgment. Fable only when quality is non-negotiable and the budget allows.

This shift is not voluntary. It is the response to the bills that landed.


Still using the most powerful model for everything, or not sure how to match the right model to the right task? I help businesses build AI workflows that route work to the right model based on what actually matters. Explore my AI Consulting services or reach out directly.


The Point No Benchmark Mentions

The “DeepSeek vs. Claude” debate that has dominated the industry for months is the wrong comparison for most people reading it. Real data: Claude Sonnet 4.6 covers 80% of real-world software engineering tasks at one-fifth the cost of Opus 4.8, with benchmark quality within one point of DeepSeek V4 Pro on SWE-bench Verified (79.6% versus 80.6%).

DeepSeek V4 Flash at $0.28 per million output tokens is real and capable for non-sensitive tasks. But its data is stored on servers in China, subject to Chinese cybersecurity law that permits government access without a court order and without user notification. The US Department of Defense banned it on government devices in February 2026. This is not ideology: it is a compliance question. For any work that touches client data, confidential strategy, or regulated sectors under GDPR, financial law, or healthcare: do not use it.

For tasks involving public information, general research, and high-volume processing with no sensitive data, the practical risk is low. As I explored in depth in my article on the economics of AI skills, the simplest operational rule is this: do not put into DeepSeek anything you would not post in a public forum.


Knowing that you need to orchestrate is half the work. Knowing how to do it, with which model for which specific task, with prompts already calibrated for non-technical professionals: that is what we cover below.

Want to learn how to use AI in your work, without depending on updates and without following courses that expire every six months? I built a structured, updatable-by-design program. Discover the From user to orchestrator course.


Reality Check. The problem is not that companies are using the wrong AI. It is that most have not yet understood that using the right one consistently requires a deliberate decision before each task, not after the invoice arrives.


The most useful AI model is not the most powerful one that exists. It is the one most suited to the task in front of you. Fable 5 is real, impressive, and for most of the work you do every day, it is overkill. The real skill is not accessing the most advanced model. It is knowing when not to use it.

PAID SECTION

You have seen the market picture. Now the operational part.

What’s inside:

  • The 4-question framework that lets you choose the right model in under 30 seconds, for any task

  • Model by model, without jargon: when to use Haiku, Sonnet, Opus, and Fable with real examples in communications, marketing, research, and strategy

  • Pre-written and annotated prompts for each model tier, calibrated for non-technical professionals

  • The DeepSeek rule: when to leave the Claude ecosystem, when it is a risk, and how to evaluate it in 10 seconds

  • The hidden saving almost no one talks about: prompt caching, which can cut costs by 90% on repetitive tasks without changing anything in your process

  • The final matrix: a task-by-task visual framework you can keep open while you work

The pattern that emerges from this data is not “use the cheapest model.” It is more nuanced than that, and it changes the question you should ask yourself before every task.


User's avatar

Continue reading this post for free, courtesy of Matija Vidmar.

Or purchase a paid subscription.
© 2026 Matija Vidmar · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture