幾何学的認識画像フローマッチング

要旨

近年の生成モデルの進展は、多様体に制約された設定における幾何認識モデリングの威力を浮き彫りにしている。しかしながら、自然画像の分野では依然としてユークリッド的仮定に留まっており、データ内の内在的な幾何構造の可能性を活用できていない。本稿では、自然画像の幾何構造を研究対象とし、意味情報が主に方向成分に符号化される一方、ノルム成分は大域平均で近似可能であることを観察する。この特性はRGB空間と潜在空間の両方で成立し、自然画像が超球面上で効果的にモデル化可能であることを示唆している。この知見に基づき、角度距離を活用した球面最適輸送フローマッチング（SOT-CFM）と、多様体上で直接ダイナミクスを制約する球面フローマッチング（SFM）を導入する。実験により、これらの幾何認識手法がユークリッド的ベースラインを上回る優れた性能を達成することを実証する。最終的に、本稿はリーマン多様体に基づくモデリングと自然画像生成の間のギャップを埋める新たな視点を提供するものである。

English

Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.