Skip to main content

VideoMemory

VideoMemory is a video monitoring system for agents. It keeps camera and video streams running as long-lived inputs, samples and filters frames, calls vision models only when useful, and turns visual conditions into task-scoped notes, evidence, and webhook events.

The project is built around a simple boundary:

video streams
  -> VideoMemory ingestion, filtering, buffering, and evidence capture
  -> task state and event delivery
  -> external agents decide what to do next

The goal is not another one-off camera demo. The goal is to make video streams agent-addressable without forcing every agent to implement camera discovery, stream recovery, model selection, semantic filtering, evidence alignment, readiness checks, or cost accounting.

What It Does
#

  • Watches local, USB, browser, Android, and network camera streams.
  • Lets agents create natural-language monitoring tasks over those streams.
  • Supports one-off frame captioning, fresh image capture, latest-frame preview, and persistent monitor tasks.
  • Uses local semantic filtering and task-specific sampling to reduce unnecessary vision-model calls.
  • Saves evidence frames and short clips so detections can be inspected after the fact.
  • Exposes HTTP APIs, an OpenAPI schema, webhook integration, OpenClaw hooks, and a Claude Code channel adapter.

Start Here
#

Install Path
#

The friend-facing Claude Code path is:

claude auth login
claude plugin marketplace add https://github.com/Clamepending/videomemory
claude plugin install videomemory@videomemory

For the core service, run the VideoMemory repo locally:

uv run flask_app/app.py

The service then exposes its API at http://localhost:5050, including GET /api/health, GET /api/devices, POST /api/caption_frame, POST /api/tasks, and GET /openapi.json.

Project Direction
#

The durable contribution is the agent-agnostic video layer: many streams, limited hardware, cost-aware filtering, explicit readiness, and aligned evidence. Claude Code and OpenClaw are useful adapters, but the core abstraction is broader: visual conditions become structured events that any external agent can consume.