PV

Puppeteer vision

Created 6 months ago

MCP server for scraping webpages and converting them to markdown using Puppeteer.

development location documentation public scraping AI

What is Puppeteer vision?

Use Puppeteer to browse a webpage and return a high quality Markdown. Use AI vision capabilities to handle cookies, captchas, and other interactive elements automatically.

Documentation

Puppeteer vision MCP Server

This Model Context Protocol (MCP) server provides a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown. It features AI-driven interaction capabilities to handle cookies, captchas, and other interactive elements automatically.

Features

  • Scrapes webpages using Puppeteer with stealth mode
  • Uses AI-powered interaction to automatically handle:
  • Cookie consent banners
  • CAPTCHAs
  • Newsletter or subscription prompts
  • Paywalls and login walls
  • Age verification prompts
  • Interstitial ads
  • Any other interactive elements blocking content
  • Extracts main content with Mozilla's Readability
  • Converts HTML to well-formatted Markdown
  • Special handling for code blocks, tables, and other structured content
  • Accessible via the Model Context Protocol
  • Option to view browser interaction in real-time by disabling headless mode
  • Easily consumable as an npx package.

Quick Start with NPX

The recommended way to use this server is via npx, which ensures you're running the latest version without needing to clone or manually install.

  1. Prerequisites: Ensure you have Node.js and npm installed.
  2. Environment Setup: The server requires an OPENAI_API_KEY. You can provide this and other optional configurations in two ways:
  • .env file: Create a .env file in the directory where you will run the npx command.
  • Shell Environment Variables: Export the variables in your terminal session.
  1. Run the Server: Open your terminal and run:
    npx -y puppeteer-vision-mcp-server
    

Using as an MCP Tool with NPX

This server is designed to be integrated as a tool within an MCP-compatible LLM orchestrator.

Environment Configuration Details

Regardless of how you run the server (NPX or local development), it uses the following environment variables:

  • OPENAI_API_KEY: (Required) Your API key for accessing the vision model.
  • VISION_MODEL: (Optional) The model to use for vision analysis.
  • API_BASE_URL: (Optional) Custom API endpoint URL.
  • TRANSPORT_TYPE: (Optional) The transport protocol to use.
  • PORT: (Optional) The port for the HTTP server in SSE or HTTP mode.
  • DISABLE_HEADLESS: (Optional) Set to true to run the browser in visible mode.

Communication Modes

The server supports three communication modes:

  1. stdio (Default): Communicates via standard input/output.
  2. SSE mode: Communicates via Server-Sent Events over HTTP.
  3. HTTP mode: Communicates via Streamable HTTP transport with session management.

Tool Usage (MCP Invocation)

The server provides a scrape-webpage tool.

How It Works

The system uses vision-capable AI models to analyze screenshots of web pages and decide on actions like clicking, typing, or scrolling to bypass overlays and consent forms.

Installation & Development (for Modifying the Code)

If you wish to contribute, modify the server, or run a local development version:

  1. Clone the Repository:
    git clone https://github.com/djannot/puppeteer-vision-mcp.git
    cd puppeteer-vision-mcp
    
  2. Install Dependencies:
    npm install
    
  3. Build the Project:
    npm run build
    
  4. Set Up Environment: Create a .env file in the project's root directory with your OPENAI_API_KEY and any other desired configurations.
  5. Run for Development:
    npm start
    
    Or, for automatic rebuilding on changes:
    npm run dev
    

Customization (for Developers)

You can modify the behavior of the scraper by editing:

  • src/ai/vision-analyzer.ts
  • src/ai/page-interactions.ts
  • src/scrapers/webpage-scraper.ts
  • src/utils/markdown-formatters.ts

Dependencies

Key dependencies include:

  • @modelcontextprotocol/sdk
  • puppeteer, puppeteer-extra
  • @mozilla/readability, jsdom
  • turndown, sanitize-html
  • openai (or compatible API for vision models)
  • express (for SSE mode)
  • zod

Server Config

{
  "mcpServers": {
    "puppeteer-vision-server": {
      "command": "npx",
      "args": [
        "puppeteer-vision"
      ]
    }
  }
}

Links & Status

Repository: github.com
Hosted: No
Global: No
Official: No

Project Info

Hosted Featured
Created At: May 23, 2025
Updated At: Aug 07, 2025
Author: djannot
Category: community
License: MIT
Tags:
development location documentation