№ 015 · April 17, 2026

Four agents walk into a browser

Google, OpenAI, Anthropic, and Perplexity have all shipped browser agents. The form factor converged in eighteen months. The benchmarks haven't.

Between October 2024 and early 2026, every major AI lab shipped a browser agent. Google has Mariner. OpenAI has ChatGPT agent. Anthropic has Claude computer use. Perplexity has Comet. The form factor converged faster than the capabilities.

What a browser agent does

A browser agent watches a screen, plans a sequence of actions, and executes them—filling forms, clicking buttons, navigating between sites. The premise: give it a task in plain English, and it handles the clicking for you. The reality: these systems still fail on complex workflows, and the labs know it.

Anthropic shipped first. On October 22, 2024, the company announced computer use as a public beta feature for Claude 3.5 Sonnet. The pitch was simple: “developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text.” Early adopters included Canva, Replit, and The Browser Company. The benchmark numbers told a more cautious story.

“At this stage, it is still experimental—at times cumbersome and error-prone.”

That disclaimer came from Anthropic itself. On OSWorld, the benchmark for operating-system-level tasks, Claude 3.5 Sonnet scored 14.9 percent in screenshot-only mode. The next-best system managed 7.8 percent. With additional steps allowed, Claude reached 22 percent. Not great. Just better than the alternatives.

The Operator lifecycle

OpenAI’s Operator launched on January 23, 2025, and shut down seven months later. It scored 38.1 percent on OSWorld and 58.1 percent on WebArena, a web-interaction benchmark. Those numbers beat Claude’s October 2024 results but still fell short of the human-level accuracy OpenAI had implied was coming.

Operator’s deprecation on August 31, 2025, came with the announcement of ChatGPT agent, its replacement. The Operator-to-agent transition happened fast enough that some users never got access to the original. The pattern suggests OpenAI treats browser agents as infrastructure to iterate rather than products to defend.

Mariner and Comet

Google’s Project Mariner remains labeled a “research prototype” even as it ships to Google AI Ultra subscribers in the United States. Unlike Operator, which ran tasks in the user’s browser, Mariner executes on virtual machines—a sandboxing choice that limits access and limits damage. Google’s pitch emphasizes multimodal reasoning: the agent observes the browser display, plans actionable steps, and executes them across sites. One demo shows Mariner navigating from an email to TaskRabbit to hire help for a task mentioned in the message.

The teach-and-repeat feature stands out. “Once agents have learned how to do a task, they can try to replicate the same workflow in the future with minimal input,” Google claims. No benchmark scores have been published for Mariner.

Perplexity’s Comet rounds out the quartet. It appears alongside Perplexity Computer and Perplexity Assistant in the company’s product suite, but primary documentation on Comet’s capabilities remains thin. What’s clear: the company valued at $21.21 billion as of early 2026 considers browser agents part of its product line, not a research experiment.

The convergence and the gap

Four labs. Four browser agents. Eighteen months from Anthropic’s beta to Google’s prototype. The timeline compression is real. The capability gap is real.

OSWorld and WebArena remain the benchmarks to watch, but published scores are sparse. OpenAI’s Operator scored higher than Anthropic’s 2024 Claude—then got deprecated. Google hasn’t released numbers. Perplexity hasn’t either. The labs are shipping faster than they’re measuring, or at least faster than they’re disclosing.

What unites these products is the bet: that controlling a browser is the wedge into controlling a computer, and controlling a computer is the wedge into agentic work that earns subscription revenue. What divides them is execution maturity. Anthropic warns users to stick to low-risk tasks. Google sandboxes everything in virtual machines. OpenAI already killed its first version.

The browser agent form factor has arrived. Browser agent reliability has not.