HLFormer: 双曲学習による部分関連ビデオ検索の強化

要旨

部分関連ビデオ検索（PRVR）は、トリミングされていないビデオと部分的な内容しか記述していないテキストクエリをマッチングさせるという重要な課題に対処します。既存の手法では、ユークリッド空間における幾何学的な歪みが生じ、ビデオの内在的な階層構造を誤って表現したり、特定の階層的セマンティクスを見落としたりすることがあり、結果として最適でない時間的モデリングを引き起こします。この問題を解決するため、我々はPRVR向けの最初の双曲線モデリングフレームワークであるHLFormerを提案します。HLFormerは、双曲空間学習を活用してユークリッド空間の最適でない階層モデリング能力を補います。具体的には、HLFormerはローレンツアテンションブロックとユークリッドアテンションブロックを統合し、ハイブリッド空間でビデオ埋め込みをエンコードし、Mean-Guided Adaptive Interaction Moduleを使用して特徴を動的に融合します。さらに、部分順序保存損失を導入し、ローレンツ円錐制約を通じて「テキスト < ビデオ」の階層を強化します。このアプローチは、ビデオ内容とテキストクエリ間の部分的な関連性を強化することで、クロスモーダルマッチングをさらに向上させます。大規模な実験により、HLFormerが最先端の手法を上回ることが示されています。コードはhttps://github.com/lijun2005/ICCV25-HLFormerで公開されています。

English

Partially Relevant Video Retrieval (PRVR) addresses the critical challenge of matching untrimmed videos with text queries describing only partial content. Existing methods suffer from geometric distortion in Euclidean space that sometimes misrepresents the intrinsic hierarchical structure of videos and overlooks certain hierarchical semantics, ultimately leading to suboptimal temporal modeling. To address this issue, we propose the first hyperbolic modeling framework for PRVR, namely HLFormer, which leverages hyperbolic space learning to compensate for the suboptimal hierarchical modeling capabilities of Euclidean space. Specifically, HLFormer integrates the Lorentz Attention Block and Euclidean Attention Block to encode video embeddings in hybrid spaces, using the Mean-Guided Adaptive Interaction Module to dynamically fuse features. Additionally, we introduce a Partial Order Preservation Loss to enforce "text < video" hierarchy through Lorentzian cone constraints. This approach further enhances cross-modal matching by reinforcing partial relevance between video content and text queries. Extensive experiments show that HLFormer outperforms state-of-the-art methods. Code is released at https://github.com/lijun2005/ICCV25-HLFormer.

HLFormer: 双曲学習による部分関連ビデオ検索の強化

HLFormer: Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning

要旨

Support