Compare three approaches to AI agent browser automation. Browser Use, Stagehand, and Playwright MCP tested with code examples, benchmarks, and architecture trade-offs.
TL;DR: Browser Use is a Python agent framework that gives any LLM full browser control through vision and DOM analysis — best for complex multi-step web workflows. Stagehand is a TypeScript SDK with three clean primitives (act, extract, observe) that turns natural language into browser actions — best for structured data extraction and simple automation. Playwright MCP exposes Playwright as MCP tools for any compatible agent — best when you already have an agent framework and need browser capabilities as a pluggable tool layer. Choose Browser Use for autonomous multi-page agent workflows, Stagehand for developer-friendly extraction pipelines, and Playwright MCP for integrating browser access into existing MCP-based agents like Claude Code or Cline.
Giving AI agents the ability to browse the web is the single biggest unlock in agent capability since tool use. An agent that can read documentation, fill out forms, compare products, and navigate dashboards goes from being a clever chatbot to a genuine autonomous worker. In 2026, three open-source projects have emerged as the dominant approaches to this problem, each with a fundamentally different architecture.
Browser Use treats the browser as a full environment for an autonomous agent, with a built-in planning loop that decides what to do next. Stagehand treats the browser as a programmable API where developers write the control flow and the LLM handles element targeting. Playwright MCP treats the browser as a set of tools exposed through the Model Context Protocol, letting any MCP-compatible agent decide how to use them.
The choice between them depends on how much autonomy you want the LLM to have, what language your stack uses, and whether you are building a standalone browser agent or adding browser capabilities to an existing agent framework.
Browser Use is an open-source Python framework with over 65,000 GitHub stars that turns any LLM into a web-browsing agent. Built on Playwright, it provides a complete agent loop: the LLM observes the current page state (through screenshots and DOM analysis), plans the next action, executes it through Playwright, and verifies the result. It supports any LLM provider (OpenAI, Anthropic, Google, local models) and handles complex scenarios like multi-tab browsing, file uploads and downloads, persistent authentication, and custom action injection. Browser Use is designed for fully autonomous web workflows where the agent decides what to do with minimal human guidance.
Stagehand is a TypeScript SDK by Browserbase with over 15,000 GitHub stars that adds AI-powered automation on top of Playwright. Rather than running an autonomous agent loop, Stagehand exposes three primitives: act() converts natural language instructions into browser actions, extract() pulls structured data from pages using a Zod schema, and observe() returns available actions on the current page. The developer writes the control flow (navigate here, extract this, click that), while the LLM handles the hard part of identifying the right DOM elements. Stagehand uses a hybrid vision + DOM approach for element targeting and integrates with Browserbase's cloud browser infrastructure for scaling.
Playwright MCP is Microsoft's official MCP server that exposes Playwright browser automation as Model Context Protocol tools. Released as part of the Playwright ecosystem, it provides tools like browser_navigate, browser_click, browser_type, browser_snapshot, and browser_screenshot that any MCP-compatible client can call. It supports two modes: a snapshot mode that uses accessibility tree representations (lower token cost) and a vision mode that sends screenshots. Playwright MCP does not include any agent logic — it is purely a tool layer that lets existing agents like Claude Code, Cline, or custom LangGraph agents control a browser through standardized MCP tool calls.
The fundamental question these three tools answer differently is: who drives the browser automation loop?
In Browser Use, the LLM drives. You give it a task ("book a flight from SFO to JFK for next Tuesday under $400"), and the agent autonomously navigates websites, fills forms, compares options, and completes the booking. The developer provides the goal; the agent figures out the steps. Internally, Browser Use maintains a conversation history with the LLM, sending page state (screenshots and/or extracted DOM) at each step and receiving the next action to execute.
In Stagehand, the developer drives. You write the navigation flow and use Stagehand's primitives to interact with elements using natural language instead of CSS selectors. The LLM is a targeting engine, not a planner. This gives you predictable execution with AI-powered flexibility in element identification.
In Playwright MCP, neither the library nor the developer drives — the connected agent drives. Playwright MCP simply exposes browser actions as MCP tools. When Claude Code or another MCP client needs to interact with a web page, it calls tools like browser_navigate and browser_click. The agent's own reasoning determines what to do; Playwright MCP just executes.
Once configured, the agent calls tools like:
How each tool understands web pages determines its reliability and token cost.
Browser Use sends the full page state to the LLM at each step. By default, it extracts a simplified DOM representation that strips away irrelevant HTML attributes and invisible elements, then adds bounding-box coordinates for interactive elements. It can optionally send screenshots for vision-based understanding. This dual approach is thorough but expensive — each step can consume 2,000-8,000 tokens depending on page complexity.
Stagehand uses a hybrid approach called "DOM + Vision." It first analyzes the page DOM to identify candidate elements, then optionally uses vision (screenshots with element annotations) to verify targeting. The observe() method returns a list of actionable elements with descriptions, and act() uses this information to execute the right action. By narrowing candidates through DOM analysis before applying vision, Stagehand achieves strong accuracy with moderate token usage — typically 500-2,000 tokens per action.
Playwright MCP offers two modes. Snapshot mode converts the page's accessibility tree into a text representation where each interactive element gets a reference ID (like [ref=e12]). This is compact (typically 500-3,000 tokens) and works well for structured pages. Vision mode sends screenshots and requires a vision-capable model. Snapshot mode is the default and handles most use cases with the lowest token cost of the three tools.
Real-world browser automation involves authentication, multi-page flows, error recovery, and dynamic content. Here is how each tool handles the hard cases.
Authentication and Sessions: Browser Use supports persistent browser contexts, so cookies and login state survive across agent runs. You can inject cookies, use saved profiles, or let the agent log in autonomously. Stagehand inherits Playwright's browser context management — you handle session persistence through standard Playwright APIs. Playwright MCP maintains browser state within a session but does not persist across server restarts without external session management.
Error Recovery: Browser Use has built-in retry logic. If an action fails, the agent re-observes the page and tries a different approach. This autonomous recovery is its biggest advantage for complex workflows — the agent can adapt when a button moves, a popup appears, or a page loads differently than expected. Stagehand relies on the developer to handle errors, since the developer writes the control flow. Playwright MCP delegates error handling to the connected agent, which can re-snapshot and retry.
Multi-page Flows: Browser Use natively handles multi-tab scenarios and cross-page navigation. Its agent maintains context across page transitions and can work with multiple tabs simultaneously. Stagehand works within Playwright's page model — multi-page flows require explicit navigation code. Playwright MCP supports multi-tab work through tool calls but requires the agent to manage the logical flow.
We benchmarked a representative task — navigating to Hacker News, extracting the top 5 story titles and URLs, and returning them as structured JSON — across all three tools using Claude Sonnet as the LLM.
Browser Use is the slowest and most expensive because its agent loop observes, plans, and verifies at each step. Stagehand is fastest because the developer can write a direct extraction with minimal LLM calls. Playwright MCP sits in between — the connected agent needs to reason about what tools to call but does not have Browser Use's overhead of full page state per step.
For simple, well-defined tasks, Stagehand is 3-4x cheaper than Browser Use. For complex tasks where the workflow is not known in advance, Browser Use's autonomous planning becomes essential and the cost premium is justified.
A critical consideration is how each tool fits into your existing agent stack.
Browser Use is a self-contained agent framework. It works standalone or integrates with LangChain through its langchain_anthropic (or other provider) LLM interface. It can also be used as a tool within a LangGraph agent, where a graph node calls Browser Use to handle web-based steps.
Stagehand is a library, not a framework. You call it from your application code — whether that is a LangGraph agent, a FastAPI endpoint, or a simple script. It plays well as a component in larger systems because it does not try to own the agent loop.
Playwright MCP is designed specifically for the MCP ecosystem. If your agent already supports MCP (Claude Code, Cline, Amazon Bedrock AgentCore Gateway, or a custom MCP client), adding browser capabilities is just a configuration change. No code modification to your agent is needed.
These tools are not mutually exclusive. Common patterns include:
Beyond these three, the AI browser automation space is expanding rapidly. Projects like Obscura (7,700+ GitHub stars) offer headless browsers specifically optimized for AI agents with built-in anti-detection. AgentCore Browser provides managed cloud browser infrastructure within the AWS ecosystem. OpenAI's Operator and Anthropic's computer use demonstrate that foundation model providers see browser automation as a core agent capability.
The trend is clear: every AI agent framework will eventually need a browser story. Whether that comes from an integrated tool like Browser Use, a clean SDK like Stagehand, or a protocol-level integration like Playwright MCP, the ability to interact with the web is becoming as fundamental as the ability to call APIs.
It depends on the scraping complexity. Stagehand is better for structured extraction from known page layouts — its extract() primitive with Zod schemas gives you typed, predictable output with lower token costs. Browser Use is better when you need to navigate complex, multi-page flows to reach the data — login sequences, pagination, dynamic content loading — because its autonomous agent can adapt when pages change unexpectedly.
Yes. Playwright MCP is one of the most popular MCP servers used with Claude Code. Once configured, Claude Code can navigate to your local development server, interact with your UI, take screenshots, and verify behavior. It works in both snapshot mode (for fast, token-efficient interaction) and vision mode (for visual verification). Add the Playwright MCP server to your Claude Code settings and Claude can browse any URL accessible from your machine.
Browser Use and Stagehand both use standard Playwright browser instances, which can be detected by sophisticated anti-bot systems. Browser Use supports connecting to external browser services (like Browserbase or BrightData) that provide anti-detection features. Stagehand integrates natively with Browserbase for this purpose. Playwright MCP uses standard Playwright and does not include anti-detection — you would need to connect it to a stealth browser service. For production scraping, all three tools benefit from cloud browser infrastructure that handles fingerprinting and CAPTCHA solving.
For a simple extraction task (navigate to a page, extract structured data), Stagehand uses approximately 1,000-3,000 tokens, Playwright MCP snapshot mode uses 1,500-5,000 tokens, and Browser Use uses 5,000-15,000 tokens. The gap widens with task complexity. A 10-step workflow might cost $0.01-0.03 with Stagehand, $0.02-0.05 with Playwright MCP, and $0.05-0.15 with Browser Use. Browser Use's higher cost reflects its full page observation at each step, which is what enables its autonomous recovery and adaptation capabilities.
Aaron is an engineering leader, software architect, and founder with 18 years building distributed systems and cloud infrastructure. Now focused on LLM-powered platforms, agent orchestration, and production AI. He shares hands-on technical guides and framework comparisons at fp8.co.
Explore how Cline implements MCP with real source code. Covers client architecture, tool discovery, JSON-RPC messaging, and specification compliance.
Agentic AI, MCP, ClineHow Claude Code, Cursor, Aider, and Cline work under the hood. Explore the agent loop, context engineering, tool dispatch, and edit strategies that power modern AI coding agents.
AI Engineering, Agent FrameworksComprehensive comparison of Amazon Bedrock AgentCore and LangChain for building AI agents. Compare architecture, deployment, pricing, memory management, and tool integration to choose the right framework.
AI Engineering, Agent Frameworks