In this tutorial, we will take Vercel's new Agent Browser for a spin. It’s a headless browser automation CLI designed specifically for AI agents, featuring a native Rust CLI that falls back to Node.js for maximum performance.
Quickstart
Here is the quick-start workflow we will build. The detailed walkthrough follows below.
1. Install & Setup
npm install -g agent-browser
agent-browser install
2. Start the flow
agent-browser open http://localhost:3000
3. Get the AI-friendly snapshot with refs
agent-browser snapshot -i
4. Interact using deterministic refs
agent-browser click @e2
Why Agent Browser?
Traditional E2E testing tools like Cypress or Playwright are great for humans writing tests. But for AI Agents? They struggle. DOM dumps are huge, token-heavy, and full of noise.
Vercel's Agent Browser solves this by providing deterministic refs (e.g., @e1, @e2) and a cleaned-up accessibility tree. It allows agents—whether it's Claude Code, Cursor, or Gemini—to "see" the page exactly like a user would, without getting bogged down in "div soup."
Key Features
- Universal: Works with any agent (Claude Code, Cursor, Gemini, etc.).
- AI-First: Snapshot returns an accessibility tree instead of raw HTML.
- Fast: Native Rust binary for instant command parsing.
- Deterministic: Refs stay consistent during the session.
The Architecture: Rust + Daemon
Under the hood, this isn’t just another Puppeteer wrapper. It uses a Client-Daemon architecture:
- Rust CLI: Provides instant startup and command parsing.
- Node.js Daemon: Manages the persistent Playwright browser instance.
The daemon starts automatically on your first command and stays alive. This means subsequent commands are lightning fast because we skip the "spin up browser" overhead every time.
The Use Case: Testing "AI in a Shell"
We are going to test a core user journey in my platform, AI in a Shell:
- Navigate to the homepage.
- Find the "Courses" section.
- Click on the "MCP Fundamentals" course.
- Verify the course page loads correctly.
Step 1: Installation
First, grab the CLI tool globally. It handles the heavy lifting of downloading a compatible Chromium build for you.
npm install -g agent-browser
agent-browser install
Make sure your application is running locally:
npm run dev
# > Ready on http://localhost:3000
Step 2: Opening the Browser
We start the agent session by pointing it to our local instance.
agent-browser open http://localhost:3000
Step 3: The Power of Refs (@e1)
Instead of inspecting HTML, we ask for an interactive snapshot.
agent-browser snapshot -i
The output will look something like this:
- link "AI in a Shell" [ref=e1]
- button "Sign In" [ref=e2]
- heading "Master AI Engineering" [level=1]
- link "Explore Courses" [ref=e12]
Why use Refs?
The snapshot assigns a unique ID like @e12 to elements.
- Deterministic:
@e12points to that exact link. - Token Efficient: We aren't feeding 500 lines of HTML to the LLM.
- Fast: No need to re-query the DOM.
Step 4: Navigating the Course
I want to click "Explore Courses". In the snapshot above, that’s reference @e12.
agent-browser click @e12
Now, let's see the course list by pulling another snapshot:
agent-browser snapshot -i
Output:
- heading "Available Courses" [level=2]
- link "MCP Fundamentals" [ref=e24]
- link "Azure AI" [ref=e25]
Step 5: Verify Destination
Let's enter the course and verify we landed on the right page by checking the title.
agent-browser click @e24
agent-browser get title
# > "MCP Fundamentals - AI in a Shell"
Success! We successfully navigated a full user journey using nothing but semantic refs.
Bonus: Serverless & Custom Browsers
If you are running this in a Vercel Function or AWS Lambda, you can swap out the browser binary for a lightweight build like @sparticuz/chromium (~50MB).
import chromium from '@sparticuz/chromium';
import { BrowserManager } from 'agent-browser';
export async function handler() {
const browser = new BrowserManager();
await browser.launch({
executablePath: await chromium.executablePath(),
headless: true,
});
// ... perform your agent actions
}
Conclusion
Having a browser tool that speaks "AI native" is a game changer. The ability to abstract away the DOM into a clean list of Refs makes building reliable agents significantly easier.
We can now build agents that crawl our production sites, verify critical paths, and report back, all without writing a single line of brittle CSS selector code.
Cheers to you 🎉