VisionAgent MCP

Created 7 months ago

VisionAgent MCP Server is a lightweight, side-car MCP server for natural-language computer vision commands.

Visit Homepage

development location documentation public beta

What is VisionAgent MCP?

A simple MCP server that enables your LLM to better reason over images, video and documents.

Documentation

VisionAgent MCP Server

Overview

Modern LLM “agents” call external tools through the Model Context Protocol (MCP). VisionAgent MCP is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.

Supported Use Cases (v0.1)

Capability	Description
`agentic-document-analysis`	Parse PDFs / images to extract text, tables, charts, and diagrams taking into account layouts and other visual cues.
`text-to-object-detection`	Detect free-form prompts using OWLv2 / CountGD / Florence-2 / Agentic Object Detection; outputs bounding boxes.
`text-to-instance-segmentation`	Pixel-perfect masks via Florence-2 + Segment-Anything-v2 (SAM-2).
`activity-recognition`	Recognise multiple activities in video with start/end timestamps.
`depth-pro`	High-resolution monocular depth estimation for single images.

Quick Start# Get Your VisionAgent API Key

If you do not have a VisionAgent API key, create an account and obtain your API key.

Installation

Install: npm install -g vision-tools-mcp
Configure your MCP client with the required settings.

Example Prompts

Invoice extraction: “Extract vendor, invoice date & total from this PDF using agentic-document-analysis.”
Pedestrian Recognition: “Locate every pedestrian in street.jpg via text-to-object-detection.”

Troubleshooting

Verify VISION_AGENT_API_KEY is correct and active.
Ensure outbound HTTPS to api.va.landing.ai isn’t blocked by a proxy/VPN.

Server Config

{
  "mcpServers": {
    "visionagent-mcp-server": {
      "command": "npx",
      "args": [
        "visionagent-mcp"
      ]
    }
  }
}

Links & Status

Repository: github.com

Hosted: No

Global: No

Official: Yes

Project Info

Hosted Featured

Created At: Jul 07, 2025

Updated At: Aug 07, 2025

Author: LandingAI Team

Category: official

License: MIT

Tags:

development location documentation