🎬 VideoSearch-R1: Video Verification & Temporal Grounding
VideoSearch-R1 is an agentic framework that unifies inter-video retrieval and intra-video reasoning through multi-turn interaction with a video search engine. It introduces Soft Query Refinement (SQR), which refines query tokens in a continuous latent space instead of rewriting text.
This demo focuses on the verification + temporal grounding step: given a video and a natural-language query, the model reasons about whether the described scene appears, and if so, predicts the start/end timestamps.
📄 Paper (arXiv:2607.00446) · 🌐 Project Page · 🤗 Model Checkpoints · 💻 GitHub
Examples