Claude Opus 4.8 Is Here: Everything New, Every Benchmark, and What Actually Changed

Released today with sharper judgment, cheaper fast mode, and a $965B valuation announcement on the same day. Here's the complete breakdown.

Anthropic released Claude Opus 4.8 today, May 28, 2026 — just 41 days after Opus 4.7. The new flagship model brings improvements across coding, agentic tasks, reasoning, and knowledge work, and it ships at the exact same price as its predecessor. On the same day, Anthropic announced it raised $65 billion in Series H funding at a $965 billion valuation, officially surpassing OpenAI to become the most valuable AI company in the world. Two historic milestones in a single day.

If you only read one thing about Opus 4.8, read this: it's a "modest but tangible improvement" (Anthropic's own words) that meaningfully advances three things — agentic coding, honesty, and alignment — while introducing three new features that change how you work with Claude. It's not the giant leap that the unreleased Claude Mythos promises to be, but it fixes real problems from Opus 4.7 and sets a new bar on benchmarks that matter for autonomous AI work.

Key Takeaway

Claude Opus 4.8 (API ID: claude-opus-4-8) launched May 28, 2026 at unchanged pricing ($5/M input, $25/M output). It improves SWE-Bench Pro from 64.3% to 69.2%, leads OSWorld-Verified at 83.4%, and tops GPT-5.5 and Gemini 3.1 Pro on knowledge work (GDPval-AA 1890). It's roughly 4x less likely to let code flaws pass unremarked. Three new features launched alongside it: dynamic workflows (parallel subagents in Claude Code), effort control (claude.ai and Cowork), and mid-task system entries in the Messages API. Fast mode is now 3x cheaper.

What's New in Claude Opus 4.8?

The headline improvement is agentic capability — Claude's ability to work independently through multi-step tasks using tools. Early testers report sharper judgment, better tool use, and improved reliability on long-running workflows. The model asks the right questions, catches its own mistakes, pushes back when a plan isn't sound, and builds confidence around complex explorations before making big changes. For anyone using Claude as an autonomous agent rather than a chatbot, these are the improvements that matter most.

The second major improvement is honesty. Anthropic trained all its models to avoid making claims they can't support, but AI models have a persistent problem: they jump to conclusions, confidently claiming progress when the evidence is thin. Opus 4.8 is more likely to flag uncertainties about its work and less likely to make unsupported claims. Anthropic's evaluations show it's around four times less likely than Opus 4.7 to allow flaws in code it has written to pass unremarked. It's the first Claude model to score 0% on uncritically reporting flawed results, with a more than ten-fold reduction in overconfidence.

The third improvement is alignment. Anthropic's alignment team concluded that Opus 4.8 "reaches new highs on our measures of prosocial traits like supporting user autonomy and acting in the user's best interest." Its rates of misaligned behavior — deception or cooperation with misuse — are substantially lower than Opus 4.7 and similar to Claude Mythos Preview, Anthropic's best-aligned model. (There's one concerning caveat about evaluation awareness, which we cover in our honesty paradox deep dive.)

Claude Opus 4.8 Benchmarks: How Does It Compare?

Anthropic published benchmarks comparing Opus 4.8 to its predecessor and to competing models. The gains are incremental but consistent, and Opus 4.8 leads on most agentic and knowledge-work tests. Here's how the numbers break down.

Benchmark	Opus 4.8	Opus 4.7	What It Measures
SWE-Bench Pro	69.2%	64.3%	Real-world agentic coding
OSWorld-Verified	83.4%	82.3%	Agentic computer use
Online-Mind2Web	84%	lower	Browser-agent tasks
GDPval-AA	1890	—	Knowledge work (beats GPT-5.5's 1769)
Reasoning w/ tools	57.9%	54.7%	Multidisciplinary reasoning
Terminal-Bench 2.1	74.6%	—	Terminal coding (GPT-5.5 wins at 78.2%)

The honest takeaway: Opus 4.8 leads on most agentic, computer-use, and knowledge-work benchmarks, beating both GPT-5.5 and Gemini 3.1 Pro on GDPval-AA by a wide margin. But it's not a clean sweep — GPT-5.5 still wins Terminal-Bench 2.1 (terminal-heavy coding), scoring 78.2% to Opus 4.8's 74.6%. If your workflow is dominated by long terminal sessions, GPT-5.5 remains competitive. For a complete head-to-head, see our three-way benchmark breakdown.

📬 Getting value from this?

One actionable AI insight per week. Plus a free prompt pack when you subscribe.

Subscribe free →

The Three New Features Launching With Opus 4.8

Opus 4.8 didn't launch alone. Anthropic shipped three features the same day that change how you interact with Claude across products.

Dynamic workflows (Claude Code). Available in research preview for Max, Team, and Enterprise plans, this feature lets Claude plan a large task, dispatch hundreds of parallel subagents that attack the problem from independent angles, deploy adversarial agents to refute findings, and iterate until answers converge — then verify outputs before reporting back. The flagship use case is codebase-scale migrations across hundreds of thousands of lines of code, from kickoff to merge, using the existing test suite as the bar. We break this down fully in our dynamic workflows explainer.

Effort control (claude.ai and Cowork). A new control next to the model selector lets you choose how much effort Claude puts into a response. Higher effort means Claude thinks more frequently and deeply for better responses; lower effort means faster replies that use your rate limits more slowly. This is available on all plans. Our effort controls guide covers when to use each setting.

Mid-task system entries (Messages API). The Messages API now accepts system entries inside the messages array, letting developers update Claude's instructions mid-task without breaking the prompt cache or routing through a user turn. This matters for agents that need to update permissions, token budgets, or environment context mid-run. Details in our API change breakdown.

Pricing and Availability

Claude Opus 4.8 is available everywhere today. Regular pricing is unchanged from Opus 4.7: $5 per million input tokens and $25 per million output tokens. Fast mode (2.5x speed) is priced at $10/M input and $50/M output — but notably, fast mode is now three times cheaper than it was for previous models. Developers access the model via the Claude API using claude-opus-4-8, and the opus alias now routes to it automatically. It's available on Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot (with a 15x premium request multiplier until usage-based billing launches June 1).

To get the most from Opus 4.8 regardless of how you access it, well-structured prompts produce dramatically better results. The free Prompt Optimizer sharpens your instructions before you send them, and TresPrompt brings one-click optimization directly into Claude, ChatGPT, and Gemini.

📬 Want more like this?

One actionable AI insight per week. Plus a free prompt pack when you subscribe.

Subscribe free →

What's Next: Claude Mythos

Anthropic used the Opus 4.8 announcement to tease what's coming. The company plans to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are already using Claude Mythos Preview for cybersecurity work. Models at that capability level require stronger cyber safeguards before general release, but Anthropic said it's making swift progress and expects to bring Mythos-class models to all customers "in the coming weeks." Opus 4.8's alignment already approaches Mythos Preview levels — a hint of what's coming. Read more in our Mythos timeline analysis.

Frequently Asked Questions

What is the Claude Opus 4.8 API model ID?

The API model ID is claude-opus-4-8. The opus alias now routes to it automatically, so existing integrations using the alias will upgrade. For the 1-million-token context variant, use claude-opus-4-8[1m]. It's available on the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

How much does Claude Opus 4.8 cost?

Regular usage is $5 per million input tokens and $25 per million output tokens — unchanged from Opus 4.7. Fast mode (2.5x speed) costs $10/M input and $50/M output, which is three times cheaper than fast mode was for previous models. Pricing is identical across cloud platforms.

Is Claude Opus 4.8 better than GPT-5.5?

It depends on the task. Opus 4.8 leads on agentic coding (SWE-Bench Pro), computer use (OSWorld 83.4%), browser tasks (Online-Mind2Web 84%), and knowledge work (GDPval-AA 1890 vs GPT-5.5's 1769). But GPT-5.5 still wins Terminal-Bench 2.1 (78.2% vs 74.6%) for terminal-heavy coding. For most agentic and knowledge work, Opus 4.8 is stronger; for long terminal sessions, GPT-5.5 remains competitive.

Should I upgrade from Opus 4.7 to 4.8?

For most users, yes — it's the same price with better benchmarks, dramatically improved honesty, and fixes for Opus 4.7's comment-verbosity and tool-calling issues. The upgrade is automatic if you use the opus alias. The main reason to hesitate: if your workflows are heavily tuned to 4.7's behavior, retest your prompts since the model's judgment and verbosity have changed. See our upgrade decision guide.

What are dynamic workflows in Claude Opus 4.8?

Dynamic workflows is a Claude Code feature (research preview, Max/Team/Enterprise) that lets Claude plan a large task and run hundreds of parallel subagents in a single session. The subagents attack problems from independent angles, adversarial agents try to refute findings, and the system iterates until answers converge before reporting back. The main use case is codebase-scale migrations across hundreds of thousands of lines of code.

Disclosure: Some links in this article are affiliate links. We only recommend tools we've personally tested and use regularly. See our full disclosure policy.