Skip to content

Instantly share code, notes, and snippets.

@hugs
Last active July 24, 2025 03:10
Show Gist options
  • Select an option

  • Save hugs/561566c7a65755c368aa0102daed4fbd to your computer and use it in GitHub Desktop.

Select an option

Save hugs/561566c7a65755c368aa0102daed4fbd to your computer and use it in GitHub Desktop.
Create a browser automation library from scratch!

Building Vibium: A Deep Dive into Modern Browser Automation

How to bootstrap your own automation library using WebDriver BiDi, and why sometimes you need to build from scratch

The Evolution of Browser Automation

When I created Selenium back in 2004, the web was transforming. "Web 2.0" was emerging with dynamic, JavaScript-heavy applications like Gmail and Google Maps. Traditional testing tools couldn't handle these complex, interactive web apps—they were built for static HTML pages, not the new world of asynchronous JavaScript communication and dynamic DOM manipulation.

Selenium was born out of necessity to test these sophisticated web applications. But in those early days, browser vendors didn't care about testing or automation. At. All. Every time a new browser version came out, they broke Selenium and we had to scramble to fix things with our humble crew of open source volunteers.

The grand vision for Selenium 2 was to break this cycle: Make the Vendors Care. More specifically, the people responsible for breaking things are in the best position to fix them. So we embarked on a big effort to move responsibility for browser automation from our volunteer community to the employees of the browser vendors themselves.

To achieve this, we needed coordination. We got the existing code transferred into a "neutral playing field" foundation, and initiated the process to create a formal W3C specification. The WebDriver protocol that emerged became the W3C standard and served us well for over a decade.

But while the Selenium project members were busy building the United Nations of browser vendor collaboration, the web didn't sit still. Code got more complicated, and the demand for more speed and more features kept rising. Chrome DevTools Protocol came along and changed everything. Finally, there was a crazy fast protocol to automate anything and everything and (bonus!) it was maintained by the vendor. Only problem... it was just one vendor.

All the effort on cross-browser collaboration started to backslide as Chrome became (often, alas) the only browser people cared about. We didn't give up, though. We had been here before. When Selenium was first created, Microsoft Internet Explorer was the only browser most devs cared about. It's important to keep the dream of open web standards alive even when it's not fashionable.

History teaches us that browser monocultures are temporary but dangerous. If one browser dominates, innovation slows, developers get lazy about standards compliance, and users lose choice. IE's dominance in the early 2000s nearly killed web innovation—we got years of stagnation, proprietary extensions, and broken websites that only worked in one browser. The web became less accessible, less diverse, and less resilient.

The rise of Firefox, Safari, and eventually Chrome broke that monopoly and ushered in a golden age of web innovation. Standards mattered again. Developers had to write portable code. Users could choose their browser based on features, privacy, or performance rather than compatibility. Competition drove everyone to build better, faster, more secure browsers.

Today's Chrome dominance feels different because Chrome is genuinely good and Google champions many open standards. But the patterns are eerily similar: developers optimize for one browser, testing becomes Chrome-only, and alternative browsers struggle to keep up. The web works best when it's truly open—when standards are implemented consistently across browsers and users have real choice.

The appeal of CDP was undeniable, though. Suddenly, tools like Puppeteer (2017) and Playwright (2020) could access the browser's internal APIs directly. They were faster, more reliable, and could do things WebDriver simply couldn't—like intercepting network requests in real-time, taking high-fidelity screenshots, or monitoring performance metrics. Developers flocked to these new tools because they solved real problems that WebDriver HTTP couldn't address efficiently.

But here's the thing: we'd seen this movie before. CDP was Chrome-specific and proprietary—great for Chrome users, but it fractured the automation ecosystem. Firefox had its own debugging protocol, Safari had WebKit's remote debugging, and each required different tooling. The automation community was splitting again, just like in the early 2000s when every browser had its own quirky automation approach.

Now, there's a paradox here that I can't ignore. Competition is good, but so are unified standards. Microsoft deserves credit—Playwright has genuinely pushed the automation space forward with better APIs, faster execution, and innovative features. Meanwhile, Selenium was admittedly getting a bit long in the tooth. So how do we square this circle? How do we encourage innovation while preventing lock-in?

The answer, I think, is in the implementation, not the interface. We need shared standards (like WebDriver BiDi) that multiple tools can implement in their own innovative ways. Think of it like the web itself—HTML, CSS, and JavaScript are standards, but browsers compete fiercely on performance, developer tools, and user experience. The standard provides interoperability; the implementations provide innovation. When both the standard AND the implementation come from the same source, that's when we lose the healthy tension that drives progress.

The solution? WebDriver BiDi—the next evolution that combines WebDriver's cross-browser standardization with CDP's power and speed. It's bidirectional (hence "BiDi"), real-time, and designed to give us the best of both worlds: the performance and capabilities we love from CDP, with the cross-browser compatibility that keeps the web open and diverse.

Technically, BiDi borrows heavily from CDP's successful design. Like CDP, it uses WebSockets for real-time communication and JSON for message formatting—no more slow HTTP request/response cycles. Commands and events flow bidirectionally: you can send commands to the browser AND receive events as they happen. Want to know when a network request completes? You get an event immediately, not after polling for status.

But here's where BiDi diverges from CDP: it's designed as a cross-browser standard from day one. While CDP grew organically from Chrome's internal debugging needs, BiDi was architected with input from all browser vendors. The protocol is more structured, the command namespaces are cleaner, and there's a formal specification process. It's like CDP, but with the benefit of hindsight and collaborative design rather than evolutionary growth.

The Current State of BiDi

WebDriver BiDi represents a fascinating experiment in collaborative standards development. Unlike the early days of Selenium where we were essentially reverse-engineering browser internals, BiDi has been developed with direct input from all major browser vendors. Chrome, Firefox, Safari, and Edge teams are all at the table, working together to create something that serves everyone's needs.

Where BiDi stands today:

  • Chrome: Full BiDi implementation available and stable
  • Firefox: Active development with substantial BiDi support
  • Safari: Participating in spec development, implementation in progress
  • Edge: Following Chrome's implementation closely

Adoption in automation tools:

  • Selenium 4: BiDi support alongside traditional WebDriver—users get both options
  • Puppeteer: Experimenting with BiDi as an alternative to CDP
  • Playwright: Monitoring the spec but continuing to focus on their multi-protocol approach

The interesting challenge is that BiDi needs to be sophisticated enough to replace CDP's capabilities while remaining simple enough for all browsers to implement consistently. It's a delicate balance between power and portability—give too much power and implementations diverge, too little and developers stick with proprietary alternatives.

Alright, enough context. Let's build something and see what BiDi can do!

Why Build From Scratch?

Sometimes you need to strip away all the abstractions and understand how things really work. Building Vibium from scratch taught me things I couldn't learn any other way.

Building Vibium: Step by Step

Let's walk through building a BiDi automation library from scratch. We'll assume Google Chrome on macOS, but the principles apply everywhere.

Step 1: Launching Chrome

The first step is launching Chrome with debugging enabled:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome-bidi-$(date +%s) \
  --no-first-run \
  --no-default-browser-check \
  --disable-default-apps

Key flags:

  • --remote-debugging-port=9222: Enables BiDi/CDP on port 9222
  • --user-data-dir=/tmp/...: Isolated profile for automation
  • --no-first-run: Skips setup dialogs

In Node.js:

import { spawn } from 'child_process';

const chromeProcess = spawn('/Applications/Google Chrome.app/Contents/MacOS/Google Chrome', [
  '--remote-debugging-port=9222',
  '--user-data-dir=/tmp/chrome-bidi-' + Date.now(),
  '--no-first-run',
  '--no-default-browser-check',
  '--disable-default-apps'
], {
  stdio: 'ignore',
  detached: true
});

chromeProcess.unref(); // Don't wait for Chrome to exit

Step 2: Finding the WebSocket URL

Chrome exposes its debugging targets via HTTP:

curl http://localhost:9222/json

This returns JSON with available targets:

[
  {
    "id": "FDC46E8D1BE328C613F2AB9E82756538",
    "title": "New Tab",
    "type": "page",
    "url": "chrome://newtab/",
    "webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/FDC46E8D1BE328C613F2AB9E82756538"
  },
  {
    "id": "D2ABF7E260DB9F5777174832219B8106",
    "parentId": "FDC46E8D1BE328C613F2AB9E82756538",
    "title": "chrome-untrusted://new-tab-page/one-google-bar",
    "type": "iframe",
    "url": "chrome-untrusted://new-tab-page/one-google-bar",
    "webSocketDebuggerUrl": "ws://localhost:9222/devtools/page/D2ABF7E260DB9F5777174832219B8106"
  }
]

We want the main page (not iframes):

async function getWebSocketUrl(): Promise<string> {
  const response = await fetch('http://localhost:9222/json');
  const targets = await response.json();
  
  const mainPage = targets.find((target: any) => 
    target.type === 'page' && !target.parentId
  );
  
  return mainPage.webSocketDebuggerUrl;
}

Step 3: WebSocket Connection

BiDi uses WebSocket for bidirectional communication:

const wsUrl = await getWebSocketUrl();
const ws = new WebSocket(wsUrl);

await new Promise((resolve) => {
  ws.addEventListener('open', resolve);
});

Step 4: Enabling Domains

Before you can use BiDi features, you need to enable the relevant domains:

// Enable page events and navigation
await sendCommand('Page.enable');

// Enable JavaScript execution
await sendCommand('Runtime.enable');

// Enable DOM access
await sendCommand('DOM.enable');

Why enable domains? BiDi is modular—you only pay for what you use. Enabling a domain tells Chrome to start tracking events and accepting commands for that area.

Sidebar: HTTP Status Codes - Then vs Now

The old Selenium way to get HTTP status codes was painful:

// Selenium - Required browser logs or proxy setup
LogEntries logs = driver.manage().logs().get(LogType.BROWSER);
for (LogEntry entry : logs) {
    // Parse console messages hoping to find network info
    if (entry.getMessage().contains("Failed to load resource")) {
        // Try to extract status code from error message
    }
}

The BiDi way is elegant:

// Enable network monitoring
await sendCommand('Network.enable');

// Navigate
await sendCommand('Page.navigate', { url: 'https://httpstat.us/404' });

// Listen for network events
ws.addEventListener('message', (event) => {
  const message = JSON.parse(event.data);
  if (message.method === 'Network.responseReceived') {
    console.log('Status:', message.params.response.status); // 404
  }
});

Network monitoring is built-in, real-time, and gives you everything—status codes, headers, timing, request/response bodies.

Step 5: Building the API

With the foundation in place, building automation commands is straightforward:

// Navigation
async function navigate(url: string) {
  await sendCommand('Page.navigate', { url });
}

// Find elements
async function findElement(selector: string) {
  const result = await sendCommand('Runtime.evaluate', {
    expression: `document.querySelector('${selector}')`
  });
  return result.result;
}

// Click elements
async function click(selector: string) {
  await sendCommand('Runtime.evaluate', {
    expression: `document.querySelector('${selector}').click()`
  });
}

// Type text
async function type(selector: string, text: string) {
  await sendCommand('Runtime.evaluate', {
    expression: `document.querySelector('${selector}').value = '${text}'`
  });
}

// Screenshots
async function screenshot() {
  const result = await sendCommand('Page.captureScreenshot');
  return result.data; // base64 PNG
}

The beauty of BiDi is that complex operations become simple JavaScript execution.

Step 6: Request/Response Handling

BiDi uses a request/response pattern with message IDs:

class Vibium {
  private messageId = 0;
  private pendingMessages = new Map();

  private async sendCommand(method: string, params: any = {}): Promise<any> {
    const id = ++this.messageId;
    const message = { id, method, params };

    return new Promise((resolve, reject) => {
      // Store the promise resolvers
      this.pendingMessages.set(id, { resolve, reject });
      
      // Send the command
      this.ws.send(JSON.stringify(message));
      
      // Timeout after 30 seconds
      setTimeout(() => {
        if (this.pendingMessages.has(id)) {
          this.pendingMessages.delete(id);
          reject(new Error(`Command timeout: ${method}`));
        }
      }, 30000);
    });
  }

  private setupMessageHandler() {
    this.ws.addEventListener('message', (event) => {
      const message = JSON.parse(event.data);
      
      if (message.id && this.pendingMessages.has(message.id)) {
        const { resolve, reject } = this.pendingMessages.get(message.id);
        this.pendingMessages.delete(message.id);
        
        if (message.error) {
          reject(new Error(message.error.message));
        } else {
          resolve(message.result);
        }
      }
    });
  }
}

Sidebar: The Cleanup Nightmare

Proper cleanup turned out to be the trickiest part of building Vibium. The symptom: everything worked perfectly, but Node.js would hang for 30 seconds after the automation finished.

The investigation:

  • Chrome process died immediately ✅
  • WebSocket closed properly ✅
  • No obvious leaks ✅
  • But Node.js wouldn't exit 🤔

The culprit: setTimeout timers from command timeouts! Even though we cleared the pendingMessages Map, the actual timer handles were still active, keeping Node.js alive.

The fix:

private pendingMessages = new Map<number, { 
  resolve: Function; 
  reject: Function; 
  timeoutId?: NodeJS.Timeout 
}>();

private async sendCommand(method: string, params: any = {}): Promise<any> {
  // ... setup code ...
  
  return new Promise((resolve, reject) => {
    const timeoutId = setTimeout(() => {
      if (this.pendingMessages.has(id)) {
        this.pendingMessages.delete(id);
        reject(new Error(`Command timeout: ${method}`));
      }
    }, 30000);
    
    // Store timeout ID so we can clear it later
    this.pendingMessages.set(id, { resolve, reject, timeoutId });
    this.ws.send(JSON.stringify(message));
  });
}

async close() {
  // Clear ALL timeouts during cleanup
  for (const [id, { reject, timeoutId }] of this.pendingMessages) {
    if (timeoutId) clearTimeout(timeoutId);
    reject(new Error('Browser closing'));
  }
  this.pendingMessages.clear();
  
  // Close WebSocket and kill Chrome
  this.ws.close();
  this.chromeProcess.kill('SIGKILL');
}

Lesson learned: in Node.js, every active handle (timers, sockets, processes) keeps the event loop alive. Proper cleanup means clearing everything.

The Complete Vibium Library

Here's what we built in ~200 lines of TypeScript:

import { spawn, ChildProcess } from 'child_process';

// Type declaration for Node.js built-in WebSocket
declare global {
  var WebSocket: {
    new (url: string): WebSocket;
  };
  interface WebSocket {
    addEventListener(type: 'open', listener: () => void): void;
    addEventListener(type: 'message', listener: (event: { data: string }) => void): void;
    addEventListener(type: 'error', listener: (event: any) => void): void;
    addEventListener(type: 'close', listener: () => void): void;
    send(data: string): void;
    close(): void;
  }
}

export class Vibium {
  private port: number;
  private chromePath: string;
  private chromeProcess?: ChildProcess;
  private ws?: WebSocket;
  private messageId = 0;
  private pendingMessages = new Map<number, { 
    resolve: Function; 
    reject: Function; 
    timeoutId?: NodeJS.Timeout 
  }>();

  constructor(options: { port?: number; chromePath?: string } = {}) {
    this.port = options.port || 9222;
    this.chromePath = options.chromePath || 
      '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome';
  }

  async launch(): Promise<void> {
    const userDataDir = `/tmp/chrome-bidi-${Date.now()}`;
    
    this.chromeProcess = spawn(this.chromePath, [
      `--remote-debugging-port=${this.port}`,
      `--user-data-dir=${userDataDir}`,
      '--no-first-run',
      '--no-default-browser-check',
      '--disable-default-apps'
    ], {
      stdio: 'ignore',
      detached: true
    });

    this.chromeProcess.unref();
    await this.waitForChrome();
  }

  async connect(): Promise<void> {
    const wsUrl = await this.getWebSocketUrl();
    this.ws = new WebSocket(wsUrl);
    
    return new Promise((resolve) => {
      this.ws!.addEventListener('open', () => {
        this.setupMessageHandler();
        resolve();
      });
    });
  }

  async init(): Promise<void> {
    await this.sendCommand('Page.enable');
    await this.sendCommand('Runtime.enable');
    await this.sendCommand('DOM.enable');
  }

  async navigate(url: string): Promise<void> {
    await this.sendCommand('Page.navigate', { url });
  }

  async click(selector: string): Promise<void> {
    await this.sendCommand('Runtime.evaluate', {
      expression: `document.querySelector('${selector}').click()`
    });
  }

  async type(selector: string, text: string): Promise<void> {
    await this.sendCommand('Runtime.evaluate', {
      expression: `document.querySelector('${selector}').value = '${text}'`
    });
  }

  async getTitle(): Promise<string> {
    const result = await this.sendCommand('Runtime.evaluate', {
      expression: 'document.title'
    });
    return result.result.value;
  }

  async screenshot(): Promise<{ data: string }> {
    const result = await this.sendCommand('Page.captureScreenshot');
    return { data: result.data };
  }

  async close(): Promise<void> {
    // Clear all timeouts and reject pending messages
    for (const [id, { reject, timeoutId }] of this.pendingMessages) {
      if (timeoutId) clearTimeout(timeoutId);
      reject(new Error('Browser closing'));
    }
    this.pendingMessages.clear();
    
    // Cleanup WebSocket and Chrome process
    if (this.ws) {
      this.ws.close();
    }
    
    if (this.chromeProcess && !this.chromeProcess.killed) {
      this.chromeProcess.kill('SIGKILL');
      this.chromeProcess.removeAllListeners();
    }
  }

  // Uses Node.js 22+ built-in fetch and WebSocket APIs
  private async httpGet(url: string): Promise<any> {
    const response = await fetch(url);
    return response.json();
  }

  // ... other private helper methods ...
}

Usage:

const browser = new Vibium();
await browser.launch();
await browser.connect();
await browser.init();

await browser.navigate('https://example.com');
await browser.type('#search', 'vibium automation');
await browser.click('#submit');

const title = await browser.getTitle();
const screenshot = await browser.screenshot();

await browser.close();

Light, lean, fast, and standards-based. Everything you need for modern browser automation in a clean, understandable package.

The Future: AI-Powered Automation

The next evolution of browser automation isn't just about better protocols—it's about intelligence. Here's where I see things heading:

Self-Healing Tests

// Traditional way - brittle
await browser.click('#submit-button');

// AI-enhanced way - resilient
await browser.clickWithIntent('submit the form', {
  fallbacks: ['#submit', '.submit-btn', '[type="submit"]'],
  aiAssist: true
});

AI can help tests adapt when selectors change, suggest better locators, and even understand user intent rather than rigid element matching.

Intelligent Error Handling

try {
  await browser.click('#popup-close');
} catch (error) {
  // AI suggests: "Element not found. Try waiting for popup or check if already closed"
  const suggestion = await ai.analyzeError(error, browser.getPageContext());
  // Auto-retry with AI-suggested strategy
}

Visual Verification

// Instead of exact pixel matching
await browser.expectScreenshotToMatch('login-page.png');

// AI-powered visual verification
await browser.expectPageToLookLike('login page layout', {
  tolerance: 'semantic', // Understands intent vs exact pixels
  ignoreTemporalElements: true // Ignores dates, counters, etc.
});

Cloud-Native Execution

Modern automation should run anywhere:

// Local execution
const browser = new Vibium();

// Remote execution on Fly.io
const browser = new Vibium({
  remote: {
    provider: 'fly',
    region: 'ord',
    machine: 'shared-cpu-1x'
  }
});

// Container execution
const browser = new Vibium({
  container: {
    image: 'vibium/chrome:latest',
    resources: { cpu: '1x', memory: '1GB' }
  }
});

The same API, whether running locally or in the cloud.

Conclusion

Building Vibium from scratch taught me that sometimes you need to strip away abstractions to truly understand a technology. WebDriver BiDi represents the future of browser automation—combining the standardization of WebDriver with the power and performance of modern browser APIs.

The 200 lines of TypeScript we wrote deliver 90% of what most automation needs: navigation, element interaction, screenshots, and network monitoring. No kitchen sink, no legacy baggage, just the essentials done right.

But more importantly, we now understand exactly how browser automation works at the protocol level. That knowledge will serve us well as we add AI-powered features and cloud-native execution.

The future of browser automation is light, lean, fast, and intelligent. And sometimes, to get there, you need to start from scratch.


Try Vibium yourself: npm install vibium

Want to contribute? The future of browser automation is being written right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment