VM

VisionAgent MCP

Created 4 months ago

VisionAgent MCP Server is a lightweight, side-car MCP server for natural-language computer vision commands.

development location documentation public beta

What is VisionAgent MCP?

A simple MCP server that enables your LLM to better reason over images, video and documents.

Documentation

VisionAgent MCP Server

Overview

Modern LLM “agents” call external tools through the Model Context Protocol (MCP). VisionAgent MCP is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.

Supported Use Cases (v0.1)

Capability Description
agentic-document-analysis Parse PDFs / images to extract text, tables, charts, and diagrams taking into account layouts and other visual cues.
text-to-object-detection Detect free-form prompts using OWLv2 / CountGD / Florence-2 / Agentic Object Detection; outputs bounding boxes.
text-to-instance-segmentation Pixel-perfect masks via Florence-2 + Segment-Anything-v2 (SAM-2).
activity-recognition Recognise multiple activities in video with start/end timestamps.
depth-pro High-resolution monocular depth estimation for single images.

Quick Start# Get Your VisionAgent API Key

If you do not have a VisionAgent API key, create an account and obtain your API key.

Installation

  1. Install: npm install -g vision-tools-mcp
  2. Configure your MCP client with the required settings.

Example Prompts

  • Invoice extraction: “Extract vendor, invoice date & total from this PDF using agentic-document-analysis.”
  • Pedestrian Recognition: “Locate every pedestrian in street.jpg via text-to-object-detection.”

Troubleshooting

  • Verify VISION_AGENT_API_KEY is correct and active.
  • Ensure outbound HTTPS to api.va.landing.ai isn’t blocked by a proxy/VPN.

Server Config

{
  "mcpServers": {
    "visionagent-mcp-server": {
      "command": "npx",
      "args": [
        "visionagent-mcp"
      ]
    }
  }
}

Links & Status

Repository: github.com
Hosted: No
Global: No
Official: Yes

Project Info

Hosted Featured
Created At: Jul 07, 2025
Updated At: Aug 07, 2025
Author: LandingAI Team
Category: official
License: MIT
Tags:
development location documentation