Anthropic ships Opus 4.7, and nobody quite understands what 'xhigh' means yet
Anthropic releases Opus 4.7 with adaptive thinking, 128k max output, and breaking API changes—all at unchanged pricing. Customer benchmarks show 12-20% gains.
Anthropic shipped Claude Opus 4.7 on April 16, 2026, at the same $5/$25 per million tokens as its predecessor. In a market where capability gains typically arrive with price increases, this is the notable fact: the ceiling went up, the bill didn’t.
The model brings three material changes. First, 128,000 max output tokens—double what Sonnet 4.6 offers. Second, adaptive thinking is now required; the old manual budget_tokens approach returns a 400 error. Third, a new effort parameter replaces the token-counting approach to thinking depth, with settings that run from low to what developers are calling “xhigh.”
That last point is where the confusion enters. Anthropic’s documentation confirms the effort parameter controls thinking depth, but the company hasn’t published the exact levels in the material available at launch. Developers discovered the high end empirically. What matters for production use is simpler: at equivalent effort settings, Opus 4.7 outperforms its predecessor by margins that showed up immediately in customer benchmarks.
What the numbers say
Cursor’s internal benchmark, CursorBench, measured a 12-point jump: 70% versus 58% for Opus 4.6. Notion reported +14% accuracy with fewer tokens and a third of the tool errors—and noted it’s “the first model to pass our implicit-need tests.” Rakuten’s engineering team saw 3x more production tasks resolved on their custom SWE-Bench variant.
“Low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6.” Caitlin Colgrove, Co-Founder and CTO of Hex
That single sentence from Hex’s CTO captures the efficiency gain in practical terms. Teams can dial down the compute and match what they had before, or dial up and get capabilities that weren’t available at any setting.
On legal benchmarks, Harvey reported 90.9% on BigLaw Bench at high effort. Cognition, the team behind the Devin coding agent, offered the most telling endorsement of long-horizon reliability: the model “works coherently for hours, pushes through hard problems rather than giving up.”
The breaking change developers need to know
For teams upgrading from Opus 4.6, the API change is not optional. The old pattern—thinking: {type: "enabled", budget_tokens: N}—no longer works. Anthropic now requires thinking: {type: "adaptive"} with the effort parameter. The display field also defaults to “omitted” rather than returning thinking content; developers must explicitly request display: "summarized" to see the model’s reasoning.
Anthropic’s release notes flag this as a breaking change and point to a migration guide. The deprecation of manual extended thinking suggests the company believes semantic effort levels—letting the model determine how much thinking a task needs—outperform developer-specified token budgets in practice. Whether that’s true for all use cases remains to be seen.
The broader context
Opus 4.7 arrives while Anthropic tests cybersecurity safeguards through its Cyber Verification Program, part of the Project Glasswing initiative. The company describes Opus 4.7 as “less broadly capable than Claude Mythos Preview,” the more powerful model still in limited release. The implication: Anthropic is shipping what it can ship safely, holding back what it can’t verify yet.
Claude Opus 4 (the original, pre-point-releases) and Sonnet 4 deprecate on June 15, 2026. Teams still on those models have two months to migrate.
What to watch
The effort parameter’s upper bound—and whether “xhigh” becomes official terminology or stays as developer shorthand—will matter for teams building cost models around thinking depth. Anthropic held pricing steady this time; they rarely hold anything steady twice. The next release will tell us whether this was strategy or circumstance.