流形学习：利用表征编码器解锁标准扩散变换器

摘要

利用表征编码器进行生成建模为实现高效、高保真合成提供了路径。然而，标准扩散变换器无法直接收敛于这些表征。尽管近期研究将其归因于容量瓶颈，并提出计算成本高昂的扩散变换器宽度扩展方案，但我们证明该失效本质上是几何性的。我们发现几何干扰是根本原因：标准欧几里得流匹配迫使概率路径穿过表征编码器超球面特征空间的低密度内部区域，而非沿流形表面行进。为解决此问题，我们提出带雅可比正则化的黎曼流匹配（RJF）。通过将生成过程约束在流形测地线上并修正曲率引起的误差传播，RJF使标准扩散变换器架构无需宽度扩展即可收敛。我们的方法使标准DiT-B架构（1.31亿参数）实现有效收敛，在现有方法无法收敛的情况下达到3.37的FID指标。代码地址：https://github.com/amandpkr/RJF

English

Leveraging representation encoders for generative modeling offers a path for efficient, high-fidelity synthesis. However, standard diffusion transformers fail to converge on these representations directly. While recent work attributes this to a capacity bottleneck proposing computationally expensive width scaling of diffusion transformers we demonstrate that the failure is fundamentally geometric. We identify Geometric Interference as the root cause: standard Euclidean flow matching forces probability paths through the low-density interior of the hyperspherical feature space of representation encoders, rather than following the manifold surface. To resolve this, we propose Riemannian Flow Matching with Jacobi Regularization (RJF). By constraining the generative process to the manifold geodesics and correcting for curvature-induced error propagation, RJF enables standard Diffusion Transformer architectures to converge without width scaling. Our method RJF enables the standard DiT-B architecture (131M parameters) to converge effectively, achieving an FID of 3.37 where prior methods fail to converge. Code: https://github.com/amandpkr/RJF

流形学习：利用表征编码器解锁标准扩散变换器

Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders

摘要

Support