几何感知图像流匹配

摘要

近期生成模型的进展突显了几何感知建模在流形约束场景中的强大潜力。然而在自然图像领域，相关研究仍局限于欧几里得假设，未能充分发掘数据内在几何结构的潜力。本研究通过探索自然图像的几何特性发现，语义信息主要编码于方向性分量中，而范数分量可近似为全局平均值。该特性在RGB空间与潜在空间中均成立，表明自然图像可在超球面上进行有效建模。基于此发现，我们提出球面最优传输流匹配（SOT-CFM）与球面流匹配（SFM）方法，前者利用角度距离，后者直接在流形上约束动力学过程。实验证明，这些几何感知方法相较于欧几里得基线取得了更优性能。最终，本研究为连接黎曼流形建模与自然图像生成之间的理论鸿沟提供了全新视角。

English

Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.