Komplexitätsbalancierte Diffusionsaufteilung

Zusammenfassung

Standardmäßige kontinuierliche generative Modelle basieren auf monolithischen Architekturen, die mit grundlegend unterschiedlichen Signalregimen umgehen müssen – von isotropem Rauschen bis hin zu komplexen Datenverteilungen. Während eine Skalierung der Modellkapazität die Leistung verbessert, ist der gleichmäßige Einsatz eines massiven Netzwerks über die gesamte generative Zeitachse von Natur aus ineffizient. In dieser Arbeit schlagen wir Complexity-Balanced Splitting (CBS) vor, ein prinzipienbasiertes Rahmenwerk für die zeitliche Kapazitätszuweisung, das die generative Arbeitslast auf mehrere spezialisierte Subnetzwerke verteilt. Basierend auf der Funktionenapproximationstheorie und de Boors Äquidistributionsprinzip unterteilt CBS die Diffusionszeitachse in Segmente gleichen Approximationsaufwands und weist Regionen, in denen die generative Dynamik schwieriger zu modellieren ist, mehr Repräsentationskapazität zu. Zur Schätzung dieser lokalen Komplexität führen wir zwei komplementäre und handhabbare Monitorfunktionen ein: ein räumliches Maß basierend auf der Dirichlet-Energie des Flusses und ein geometrisches Maß basierend auf der Beschleunigung der Abtasttrajektorien. Durch die Verwendung eines leichten Hilfsmodells zur Schätzung dieser Komplexitätsprofile eliminiert unser Ansatz die Notwendigkeit heuristischer zeitlicher Aufteilungen oder rechenintensiver Suchverfahren. Umfangreiche Evaluierungen über mehrere Architekturen (SiT, JiT und UNet) und Datensätze hinweg zeigen, dass CBS die Synthesequalität konsistent verbessert, ohne die Inferenzkosten pro Schritt zu erhöhen. Insbesondere verbessert CBS den FID um ~35 % bei SiT-XL mit CFG im Vergleich zu naivem zeitlichem Partitionieren. Die Projektseite ist verfügbar unter https://noamissachar.github.io/CBS/.

English

Standard continuous-time generative models rely on monolithic architectures that must navigate vastly different signal regimes, from isotropic noise to intricate data distributions. While scaling model capacity improves performance, deploying a massive network uniformly across the entire generative timeline is inherently inefficient. In this work, we propose Complexity-Balanced Splitting (CBS), a principled framework for temporal capacity allocation that distributes the generative workload across multiple specialized sub-networks. Grounded in function approximation theory and de Boor's equidistribution principle, CBS partitions the diffusion timeline into segments of equal approximation burden, allocating more representational capacity to regions where the generative dynamics are more difficult to model. To estimate this local complexity, we introduce two complementary and tractable monitor functions: a spatial measure based on the flow's Dirichlet energy, and a geometric measure based on the acceleration of the sampling trajectories. Using a lightweight auxiliary model to estimate these complexity profiles, our approach eliminates the need for heuristic temporal splits or computationally expensive search procedures. Extensive evaluation across multiple architectures (SiT, JiT, and UNet) and datasets demonstrates that CBS consistently improves synthesis quality without increasing per-step inference cost. In particular, CBS improves FID by ~35% on SiT-XL with CFG relative to naive temporal partitioning. Project page is available at https://noamissachar.github.io/CBS/.