What is VisionAgent MCP Server is a lightweight, side-car MCP server for translating tool calls into authenticated HTTPS requests.?
VisionAgent MCP is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model for natural-language computer-vision and document-analysis commands.
Documentation
VisionAgent MCP Server
Beta – v0.1
This project is early access and subject to breaking changes until v1.0.
VisionAgent MCP Server v0.1 - Overview
Modern LLM “agents” call external tools through the Model Context Protocol (MCP).VisionAgent MCP is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client (Claude Desktop, Cursor, Cline, etc.) into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.
Pixel-perfect masks via Florence-2 + Segment-Anything-v2 (SAM-2).
activity-recognition
Recognise multiple activities in video with start/end timestamps.
depth-pro
High-resolution monocular depth estimation for single images.
Run npm run generate-tools whenever VisionAgent releases new endpoints. The script fetches the latest OpenAPI spec and regenerates the local tool map automatically.
npm install -g vision-tools-mcp
# 2 Configure your MCP client with the following settings:
{
"mcpServers": {
"VisionAgent": {
"command": "npx",
"args": ["vision-tools-mcp"],
"env": {
"VISION_AGENT_API_KEY": "<YOUR_API_KEY>",
"OUTPUT_DIRECTORY": "/path/to/output/directory",
"IMAGE_DISPLAY_ENABLED": "true" # or false, see below
}
}
}
}
Open your MCP-aware client.
Download street.png (from the assets folder in this directory, or you can choose any test image).
Paste the prompt below (or any prompt):
Detect all traffic lights in /path/to/mcp/vision-agent-mcp/assets/street.png
If your client supports inline resources, you’ll see bounding-box overlays; otherwise, the PNG is saved to your output directory, and the chat shows its path.
Prerequisites
Software
Minimum Version
Node.js
20 (LTS)
VisionAgent account
Any paid or free tier (needs API key)
MCP client
Claude Desktop / Cursor / Cline / etc.
⚙️ Configuration
ENV var
Required
Default
Purpose
VISION_AGENT_API_KEY
Yes
—
Landing AI auth token.
OUTPUT_DIRECTORY
No
—
Where rendered images / masks / depth maps are stored.
IMAGE_DISPLAY_ENABLED
No
true
false ➜ skip rendering
Sample MCP client entry (.mcp.json for VS Code / Cursor)
For MCP clients without image display capabilities, like Cursor, set IMAGE_DISPLAY_ENABLED to False. For MCP clients with image display capabilities, like Claude Desktop, set IMAGE_DISPLAY_ENABLED to true to visualize tool outputs. Generally, MCP clients that support resources (see this list: https://modelcontextprotocol.io/clients) will support image display.
💡 Example Prompts
Scenario
Prompt (after uploading file)
Invoice extraction
“Extract vendor, invoice date & total from this PDF using agentic-document-analysis.”
Pedrestrian Recognition
“Locate every pedestrian in street.jpg via text-to-object-detection.”
Agricultural segmentation
“Segment all tomatoes in kitchen.png with text-to-instance-segmentation.”
Activity recognition (video)
“Identify activities occurring in match.mp4 via activity-recognition.”
Depth estimation
“Produce a depth map for selfie.png using depth-pro.”
🏗 Architecture & Flow
┌────────────────────┐ 1. human prompt ┌───────────────────┐
│ MCP-capable client │───────────────────────────▶│ VisionAgent MCP │
│ (Cursor, Claude) │ │ (this repo) │
└────────────────────┘ └─────────▲─────────┘
▲ 6. rendered PNG / JSON │ 2. JSON tool call
│ │
│ 5. preview path / data 3. HTTPS │
│ ▼
local disk ◀──────────┐ Landing AI VisionAgent
└────────────── Cloud APIs
4. JSON / media blob
Prompt → tool-call The client converts your natural-language prompt into a structured MCP call.
Validation The server validates args with Zod schemas derived from the live OpenAPI spec.
Forward An authenticated Axios request hits the VisionAgent endpoint.
Response JSON + any base64 media are returned.
Visualization If enabled, masks / boxes / depth maps are rendered to files.
Return to chat The MCP client receives data + file paths (or inline previews).
🧑💻 Developer Guide
Here’s how to dive into the code, add new endpoints, or troubleshoot issues.
Note: Replace /path/to/build/index.js with the actual path to your built index.js file, and set your environment variables as needed. For MCP clients without image display capabilities, like Cursor, set IMAGE_DISPLAY_ENABLED to False. For MCP clients with image display capabilities, like Claude Desktop, set IMAGE_DISPLAY_ENABLED to true to visualize tool outputs. Generally, MCP clients that support resources (see this list: https://modelcontextprotocol.io/clients) will support image display.
Fetch latest OpenAPI and regenerate toolDefinitionMap.ts.
npm run build:all
Convenience: npm run build + npm run generate-tools.
Pro Tip: If you modify any files under src/ or want to pick up new endpoints from VisionAgent, run npm run build:all to recompile + regenerate tool definitions.
Network Errors
Axios errors (timeouts, 5xx) are caught and returned as:
{
"id": 4,
"error": {
"code": -32000,
"message": "VisionAgent API error: 502 Bad Gateway"
}
}
Internal Exceptions
Uncaught exceptions in handlers produce:
{
"id": 5,
"error": {
"code": -32603,
"message": "Internal error: Unexpected token in JSON at position 345"
}
}
🛟 Troubleshooting
Verify VISION_AGENT_API_KEY is correct and active.
Free tiers have rate limits—check your dashboard.
Ensure outbound HTTPS to api.va.landing.ai isn’t blocked by a proxy/VPN.
The local tool map may be stale. Run:
npm run generate-tools
npm start
The code uses the Blob & FormData APIs natively introduced in Node 20.
Upgrade via nvm install 20 (mac/Linux) or download from nodejs.org if on Windows.
Also not that specific clients will have their own helpful documentation. For example, if you are using the OpenAI Agents SDK, refer to their documentation here: https://openai.github.io/openai-agents-python/mcp/
🤝 Contributing
We love PRs!
Fork → git checkout -b feature/my-feature.
npm run typecheck (no errors)
Open a PR explaining what and why.
🔒 Security & Privacy
The MCP server runs locally, so no files are forwarded anywhere except Landing AI’s API endpoints you explicitly call.
Output images/masks are written to OUTPUT_DIRECTORYonly on your machine.