影像搜尋:超越語義依賴限制的自適應測試時搜尋於視頻生成
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints
October 16, 2025
作者: Meiqi Wu, Jiashu Zhu, Xiaokun Feng, Chubin Chen, Chen Zhu, Bingze Song, Fangyuan Mao, Jiahong Wu, Xiangxiang Chu, Kaiqi Huang
cs.AI
摘要
视频生成模型已取得显著进展,尤其在现实场景中表现卓越;然而,在想象性场景中,其性能显著下降。这些提示通常涉及罕见共现的概念,具有长距离语义关系,超出了训练分布的范围。现有方法通常通过测试时缩放来提升视频质量,但其固定的搜索空间和静态的奖励设计限制了在想象性场景中的适应性。为填补这一空白,我们提出了ImagerySearch,一种基于提示的自适应测试时搜索策略,能够根据提示中的语义关系动态调整推理搜索空间和奖励函数。这使得在具有挑战性的想象性场景中生成更加连贯且视觉上合理的视频成为可能。为评估这一方向的进展,我们引入了LDT-Bench,这是首个专为长距离语义提示设计的基准,包含2,839对多样化概念对,并采用自动化协议评估创意生成能力。大量实验表明,ImagerySearch在LDT-Bench上持续优于强大的视频生成基线和现有的测试时缩放方法,并在VBench上实现了具有竞争力的改进,展示了其在多样化提示类型中的有效性。我们将发布LDT-Bench和代码,以促进未来在想象性视频生成领域的研究。
English
Video generation models have achieved remarkable progress, particularly
excelling in realistic scenarios; however, their performance degrades notably
in imaginative scenarios. These prompts often involve rarely co-occurring
concepts with long-distance semantic relationships, falling outside training
distributions. Existing methods typically apply test-time scaling for improving
video quality, but their fixed search spaces and static reward designs limit
adaptability to imaginative scenarios. To fill this gap, we propose
ImagerySearch, a prompt-guided adaptive test-time search strategy that
dynamically adjusts both the inference search space and reward function
according to semantic relationships in the prompt. This enables more coherent
and visually plausible videos in challenging imaginative settings. To evaluate
progress in this direction, we introduce LDT-Bench, the first dedicated
benchmark for long-distance semantic prompts, consisting of 2,839 diverse
concept pairs and an automated protocol for assessing creative generation
capabilities. Extensive experiments show that ImagerySearch consistently
outperforms strong video generation baselines and existing test-time scaling
approaches on LDT-Bench, and achieves competitive improvements on VBench,
demonstrating its effectiveness across diverse prompt types. We will release
LDT-Bench and code to facilitate future research on imaginative video
generation.