幾何感知影像流匹配

摘要

近期生成模型的進展凸顯了在流形約束環境中進行幾何感知建模的潛力。然而，對於自然影像而言，該領域仍侷限於歐幾里得假設，未能善用資料內在幾何結構的優勢。本研究探討自然影像的幾何特性，觀察到語義資訊主要編碼於方向分量中，而範數分量則可透過全域平均近似。此特性在RGB空間與潛在空間中均成立，暗示自然影像可有效建模於超球面上。基於此發現，我們引入球形最優傳輸流匹配（SOT-CFM），其利用角距離進行運算，以及球形流匹配（SFM），將動態過程直接約束於流形上。實驗結果顯示，這些幾何感知方法相較於歐幾里得基準模型達到更優異的效能。最終，本研究提供了一個全新視角，橋接了黎曼流形建模與自然影像生成之間的鴻溝。

English

Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.