mcp-vision

Created 8 months ago

MCP server exposing HuggingFace computer vision models for object detection.

development documentation public computer vision HuggingFace

What is mcp-vision?

A MCP server exposing HuggingFace computer vision models such as zero-shot object detection as tools, enhancing the vision capabilities of large language or vision-language models.

Documentation

mcp-vision Documentation

Installation

Clone the repo:

 git clone [email protected]:groundlight/mcp-vision.git

Build a local docker image:

 cd mcp-vision
 make build-docker

Configuring Claude Desktop

Add this to your claude_desktop_config.json: If your local environment has access to a NVIDIA GPU:

 "mcpServers": {
   "mcp-vision": {
     "command": "docker",
     "args": ["run", "-i", "--rm", "--runtime=nvidia", "--gpus", "all", "mcp-vision"],
     "env": {}
   }
 }

Or, CPU only:

 "mcpServers": {
   "mcp-vision": {
     "command": "docker",
     "args": ["run", "-i", "--rm", "mcp-vision"],
     "env": {}
   }
 }

Tools

The following tools are currently available through the mcp-vision server:

locate_objects - Detect and locate objects in an image using zero-shot object detection pipelines.
zoom_to_object - Zoom into an object in the image, allowing you to analyze it more closely.

Development

Run locally using the uv package manager:

 uv install uv run python mcp_vision

Build the Docker image locally:

 make build-docker

Run the Docker image locally:

 make run-docker-cpu

Troubleshooting

If Claude Desktop is failing to connect to mcp-vision, check the configuration and ensure the correct model size is used.

Server Config

{
  "mcpServers": {
    "mcp-vision-server": {
      "command": "npx",
      "args": [
        "mcp-vision"
      ]
    }
  }
}

Links & Status

Repository: github.com

Hosted: No

Global: No

Official: No

Project Info

Hosted Featured

Created At: Jul 02, 2025

Updated At: Aug 07, 2025

Author: Groundlight

Category: community

License: MIT

Tags:

development documentation public