development
location
documentation
public
scraping
AI
What is Puppeteer vision?
Use Puppeteer to browse a webpage and return a high quality Markdown. Use AI vision capabilities to handle cookies, captchas, and other interactive elements automatically.
Documentation
Puppeteer vision MCP Server
This Model Context Protocol (MCP) server provides a tool for scraping webpages and converting them to markdown format using Puppeteer, Readability, and Turndown. It features AI-driven interaction capabilities to handle cookies, captchas, and other interactive elements automatically.
Features
Scrapes webpages using Puppeteer with stealth mode
Uses AI-powered interaction to automatically handle:
Cookie consent banners
CAPTCHAs
Newsletter or subscription prompts
Paywalls and login walls
Age verification prompts
Interstitial ads
Any other interactive elements blocking content
Extracts main content with Mozilla's Readability
Converts HTML to well-formatted Markdown
Special handling for code blocks, tables, and other structured content
Accessible via the Model Context Protocol
Option to view browser interaction in real-time by disabling headless mode
Easily consumable as an npx package.
Quick Start with NPX
The recommended way to use this server is via npx, which ensures you're running the latest version without needing to clone or manually install.
Prerequisites: Ensure you have Node.js and npm installed.
Environment Setup: The server requires an OPENAI_API_KEY. You can provide this and other optional configurations in two ways:
.env file: Create a .env file in the directory where you will run the npx command.
Shell Environment Variables: Export the variables in your terminal session.
Run the Server: Open your terminal and run:
npx -y puppeteer-vision-mcp-server
Using as an MCP Tool with NPX
This server is designed to be integrated as a tool within an MCP-compatible LLM orchestrator.
Environment Configuration Details
Regardless of how you run the server (NPX or local development), it uses the following environment variables:
OPENAI_API_KEY: (Required) Your API key for accessing the vision model.
VISION_MODEL: (Optional) The model to use for vision analysis.
API_BASE_URL: (Optional) Custom API endpoint URL.
TRANSPORT_TYPE: (Optional) The transport protocol to use.
PORT: (Optional) The port for the HTTP server in SSE or HTTP mode.
DISABLE_HEADLESS: (Optional) Set to true to run the browser in visible mode.
Communication Modes
The server supports three communication modes:
stdio (Default): Communicates via standard input/output.
SSE mode: Communicates via Server-Sent Events over HTTP.
HTTP mode: Communicates via Streamable HTTP transport with session management.
Tool Usage (MCP Invocation)
The server provides a scrape-webpage tool.
How It Works
The system uses vision-capable AI models to analyze screenshots of web pages and decide on actions like clicking, typing, or scrolling to bypass overlays and consent forms.
Installation & Development (for Modifying the Code)
If you wish to contribute, modify the server, or run a local development version:
Clone the Repository:
git clone https://github.com/djannot/puppeteer-vision-mcp.git
cd puppeteer-vision-mcp
Install Dependencies:
npm install
Build the Project:
npm run build
Set Up Environment: Create a .env file in the project's root directory with your OPENAI_API_KEY and any other desired configurations.
Run for Development:
npm start
Or, for automatic rebuilding on changes:
npm run dev
Customization (for Developers)
You can modify the behavior of the scraper by editing: