mcp-vision
Created 5 months ago
MCP server exposing HuggingFace computer vision models for object detection.
What is mcp-vision?
A MCP server exposing HuggingFace computer vision models such as zero-shot object detection as tools, enhancing the vision capabilities of large language or vision-language models.
Documentation
mcp-vision Documentation
Installation
Clone the repo:
git clone [email protected]:groundlight/mcp-vision.git
Build a local docker image:
cd mcp-vision
make build-docker
Configuring Claude Desktop
Add this to your claude_desktop_config.json:
If your local environment has access to a NVIDIA GPU:
"mcpServers": {
"mcp-vision": {
"command": "docker",
"args": ["run", "-i", "--rm", "--runtime=nvidia", "--gpus", "all", "mcp-vision"],
"env": {}
}
}
Or, CPU only:
"mcpServers": {
"mcp-vision": {
"command": "docker",
"args": ["run", "-i", "--rm", "mcp-vision"],
"env": {}
}
}
Tools
The following tools are currently available through the mcp-vision server:
- locate_objects - Detect and locate objects in an image using zero-shot object detection pipelines.
- zoom_to_object - Zoom into an object in the image, allowing you to analyze it more closely.
Development
Run locally using the uv package manager:
uv install uv run python mcp_vision
Build the Docker image locally:
make build-docker
Run the Docker image locally:
make run-docker-cpu
Troubleshooting
If Claude Desktop is failing to connect to mcp-vision, check the configuration and ensure the correct model size is used.
Server Config
{
"mcpServers": {
"mcp-vision-server": {
"command": "npx",
"args": [
"mcp-vision"
]
}
}
}