複雑性バランス拡散分割

要旨

標準的な連続時間生成モデルは、等方性ノイズから複雑なデータ分布に至るまで、大きく異なる信号領域を扱わなければならないモノリシックアーキテクチャに依存する。モデル容量をスケーリングすることで性能は向上するが、生成タイムライン全体にわたって大規模ネットワークを一様に展開することは本質的に非効率である。本研究では、複数の専門化されたサブネットワークに生成作業負荷を分散させる、時間的容量割り当てのための原理的なフレームワークであるComplexity-Balanced Splitting（CBS）を提案する。関数近似理論とde Boorの等分配原理に基づき、CBSは拡散タイムラインを等しい近似負荷のセグメントに分割し、生成ダイナミクスのモデル化がより困難な領域により多くの表現能力を割り当てる。この局所的な複雑性を推定するために、フローのディリクレエネルギーに基づく空間的測度と、サンプリング軌道の加速度に基づく幾何学的測度という、相補的で扱いやすい2つのモニター関数を導入する。軽量な補助モデルを用いてこれらの複雑性プロファイルを推定することで、我々のアプローチはヒューリスティックな時間的分割や計算コストの高い探索手順を不要にする。複数のアーキテクチャ（SiT、JiT、UNet）とデータセットにわたる広範な評価により、CBSがステップあたりの推論コストを増加させることなく、一貫して合成品質を向上させることが実証された。特に、CBSはSiT-XLにおいて、CFGを用いた場合、単純な時間的分割と比較してFIDを約35%改善する。プロジェクトページはhttps://noamissachar.github.io/CBS/で公開されている。

English

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS), a principled framework for temporal capacity allocation that distributes the generative workload across multiple specialized sub-networks. Grounded in function approximation theory and de Boor's equidistribution principle, CBS partitions the diffusion timeline into segments of equal approximation burden, allocating more representational capacity to regions where the generative dynamics are more difficult to model. To estimate this local complexity, we introduce two complementary and tractable monitor functions: a spatial measure based on the flow's Dirichlet energy, and a geometric measure based on the acceleration of the sampling trajectories. Using a lightweight auxiliary model to estimate these complexity profiles, our approach eliminates the need for heuristic temporal splits or computationally expensive search procedures. Extensive evaluation across multiple architectures (SiT, JiT, and UNet) and datasets demonstrates that CBS consistently improves synthesis quality without increasing per-step inference cost. In particular, CBS improves FID by ~35% on SiT-XL with CFG relative to naive temporal partitioning. Project page is available at https://noamissachar.github.io/CBS/.