MC

mcp-read-website-fast

Created 4 months ago

Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown.

development documentation public web scraping markdown

What is mcp-read-website-fast?

Fast, token-efficient web content extraction that converts websites to clean Markdown. Features Mozilla Readability, smart caching, polite crawling with robots.txt support, and concurrent fetching with minimal dependencies.

Documentation

@just-every/mcp-read-website-fast

Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown.

Overview

Existing MCP web crawlers are slow and consume large quantities of tokens. This MCP package fetches web pages locally, strips noise, and converts content to clean Markdown while preserving links. Designed for Claude Code, IDEs and LLM pipelines with minimal token footprint. Crawl sites locally with minimal dependencies.

Features

  • Fast startup using official MCP SDK with lazy loading for optimal performance
  • Content extraction using Mozilla Readability (same as Firefox Reader View)
  • HTML to Markdown conversion with Turndown + GFM support
  • Smart caching with SHA-256 hashed URLs
  • Polite crawling with robots.txt support and rate limiting
  • Concurrent fetching with configurable depth crawling
  • Stream-first design for low memory usage
  • Link preservation for knowledge graphs
  • Optional chunking for downstream processing

Installation# Claude Code

claude mcp add read-website-fast -s user -- npx -y @just-every/mcp-read-website-fast

VS Code

code --add-mcp '{"name":"read-website-fast","command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}'

Cursor

cursor://anysphere.cursor-deeplink/mcp/install?name=read-website-fast&config=eyJyZWFkLXdlYnNpdGUtZmFzdCI6eyJjb21tYW5kIjoibnB4IiwiYXJncyI6WyIteSIsIkBqdXN0LWV2ZXJ5L21jcC1yZWFkLXdlYnNpdGUtZmFzdCJdfX0=

JetBrains IDEs

Settings → Tools → AI Assistant → Model Context Protocol (MCP) → Add Choose “As JSON” and paste:

{"command":"npx","args":["-y","@just-every/mcp-read-website-fast"]}

Raw JSON (works in any MCP client)

{
  "mcpServers": {
    "read-website-fast": {
      "command": "npx",
      "args": ["-y", "@just-every/mcp-read-website-fast"]
    }
  }
}

Drop this into your client’s mcp.json (e.g. .vscode/mcp.json, ~/.cursor/mcp.json, or .mcp.json for Claude).

Features

  • Fast startup using official MCP SDK with lazy loading for optimal performance
  • Content extraction using Mozilla Readability (same as Firefox Reader View)
  • HTML to Markdown conversion with Turndown + GFM support
  • Smart caching with SHA-256 hashed URLs
  • Polite crawling with robots.txt support and rate limiting
  • Concurrent fetching with configurable depth crawling
  • Stream-first design for low memory usage
  • Link preservation for knowledge graphs
  • Optional chunking for downstream processing

Available Tools

  • read_website - Fetches a webpage and converts it to clean markdown
  • Parameters:
  • url (required): The HTTP/HTTPS URL to fetch
  • pages (optional): Maximum number of pages to crawl (default: 1, max: 100)

Available Resources

  • read-website-fast://status - Get cache statistics
  • read-website-fast://clear-cache - Clear the cache directory

Development Usage# Install

npm install
npm run build

Single page fetch

npm run dev fetch https://example.com/article

Crawl with depth

npm run dev fetch https://example.com --depth 2 --concurrency 5

Output formats

npm run dev fetch https://example.com\n\n# JSON output with metadata
npm run dev fetch https://example.com --output json\n\n# Both URL and markdown
npm run dev fetch https://example.com --output both

CLI Options

  • -p, --pages - Maximum number of pages to crawl (default: 1)
  • -c, --concurrency - Max concurrent requests (default: 3)
  • --no-robots - Ignore robots.txt
  • --all-origins - Allow cross-origin crawling
  • -u, --user-agent - Custom user agent
  • --cache-dir - Cache directory (default: .cache)
  • -t, --timeout - Request timeout in milliseconds (default: 30000)
  • -o, --output - Output format: json, markdown, or both (default: markdown)

Clear cache

npm run dev clear-cache

Auto-Restart Feature

The MCP server includes automatic restart capability by default for improved reliability:

  • Automatically restarts the server if it crashes
  • Handles unhandled exceptions and promise rejections
  • Implements exponential backoff (max 10 attempts in 1 minute)
  • Logs all restart attempts for monitoring
  • Gracefully handles shutdown signals (SIGINT, SIGTERM)

For development/debugging without auto-restart:

npm run serve:dev

Architecture

mcp/
├── src/
│   ├── crawler/ # URL fetching, queue management, robots.txt
│   ├── parser/ # DOM parsing, Readability, Turndown conversion
│   ├── cache/ # Disk-based caching with SHA-256 keys
│   ├── utils/ # Logger, chunker utilities
│   ├── index.ts # CLI entry point
│   ├── serve.ts # MCP server entry point
│   └── serve-restart.ts # Auto-restart wrapper

Development

npm run dev fetch https://example.com\n\n# Build for production
npm run build\n\n# Run tests
npm test\n\n# Type checking
npm run typecheck\n\n# Linting
npm run lint

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

Troubleshooting# Cache Issues

npm run dev clear-cache
```\n\n### Timeout Errors
- Increase timeout with `-t` flag
- Check network connectivity
- Verify URL is accessible\n\n### Content Not Extracted
- Some sites block automated access
- Try custom user agent with `-u` flag
- Check if site requires JavaScript (not supported)

## License
MIT

Server Config

{
  "mcpServers": {
    "mcp-read-website-fast-server": {
      "command": "npx",
      "args": [
        "mcp-read-website-fast"
      ]
    }
  }
}

Links & Status

Repository: github.com
Hosted: No
Global: No
Official: Yes

Project Info

Hosted Featured
Created At: Jul 17, 2025
Updated At: Aug 07, 2025
Author: just-every
Category: community
License: MIT
Tags:
development documentation public