A simple MCP server that enables your LLM to better reason over images, video and documents.
Documentation
VisionAgent MCP Server
Overview
Modern LLM “agents” call external tools through the Model Context Protocol (MCP). VisionAgent MCP is a lightweight, side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client into an authenticated HTTPS request to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK.
Supported Use Cases (v0.1)
Capability
Description
agentic-document-analysis
Parse PDFs / images to extract text, tables, charts, and diagrams taking into account layouts and other visual cues.