ChatPaper.aiChatPaper

影像搜索:超越语义依赖限制的自适应测试时视频生成搜索

ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

October 16, 2025
作者: Meiqi Wu, Jiashu Zhu, Xiaokun Feng, Chubin Chen, Chen Zhu, Bingze Song, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, Kaiqi Huang
cs.AI

摘要

视频生成模型已取得显著进展,尤其在现实场景中表现卓越;然而,在富有想象力的场景下,其性能却显著下降。这类提示通常涉及罕见共现的概念,且具有远距离语义关系,超出了训练分布的范围。现有方法普遍采用测试时缩放技术以提升视频质量,但其固定的搜索空间和静态奖励设计限制了在想象力场景中的适应性。为填补这一空白,我们提出了ImagerySearch,一种基于提示的自适应测试时搜索策略,它能根据提示中的语义关系动态调整推理搜索空间和奖励函数。这使得在具有挑战性的想象力场景下,能够生成更加连贯且视觉上可信的视频。为评估这一方向的进展,我们引入了LDT-Bench,首个专为远距离语义提示设计的基准,包含2,839对多样化概念组合,并配备自动化协议以评估创意生成能力。大量实验表明,ImagerySearch在LDT-Bench上持续超越强大的视频生成基线及现有测试时缩放方法,并在VBench上实现了具有竞争力的改进,证明了其在不同类型提示上的有效性。我们将发布LDT-Bench及代码,以促进未来关于想象力视频生成的研究。
English
Video generation models have achieved remarkable progress, particularly excelling in realistic scenarios; however, their performance degrades notably in imaginative scenarios. These prompts often involve rarely co-occurring concepts with long-distance semantic relationships, falling outside training distributions. Existing methods typically apply test-time scaling for improving video quality, but their fixed search spaces and static reward designs limit adaptability to imaginative scenarios. To fill this gap, we propose ImagerySearch, a prompt-guided adaptive test-time search strategy that dynamically adjusts both the inference search space and reward function according to semantic relationships in the prompt. This enables more coherent and visually plausible videos in challenging imaginative settings. To evaluate progress in this direction, we introduce LDT-Bench, the first dedicated benchmark for long-distance semantic prompts, consisting of 2,839 diverse concept pairs and an automated protocol for assessing creative generation capabilities. Extensive experiments show that ImagerySearch consistently outperforms strong video generation baselines and existing test-time scaling approaches on LDT-Bench, and achieves competitive improvements on VBench, demonstrating its effectiveness across diverse prompt types. We will release LDT-Bench and code to facilitate future research on imaginative video generation.
PDF532October 17, 2025