ChatPaper.aiChatPaper

语义路由:探索扩散变换器中多层大语言模型特征加权机制

Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

February 3, 2026
作者: Bozhou Li, Yushuo Guan, Haolin Li, Bohan Zeng, Yiyan Ji, Yue Ding, Pengfei Wan, Kun Gai, Yuanxing Zhang, Wentao Zhang
cs.AI

摘要

近期基于DiT的文本生成图像模型逐渐采用大语言模型作为文本编码器,但文本条件处理仍基本保持静态,且通常仅利用单一LLM层,尽管LLM各层间存在显著的语义层级差异,且扩散过程在时间维度和网络深度上均呈现非平稳的去噪动态特性。为更好地匹配DiT生成的动态过程从而增强扩散模型的生成能力,我们提出了一种配备轻量级门控机制的归一化凸融合框架,通过时间维度、深度维度及联合融合三种方式系统整合多层LLM隐藏状态。实验表明深度语义路由是最优的条件控制策略,能持续提升图文对齐度与组合生成能力(如在GenAI-Bench计数任务上提升9.97分)。相反,纯时间维度融合反而会降低视觉生成质量,我们将其归因于训练-推断轨迹失配:在无分类器指导机制下,名义时间步无法追踪有效信噪比,导致推断过程中出现语义时序错位的特征注入。总体而言,我们的研究将深度路由确立为强大有效的基线方法,并揭示了轨迹感知信号对实现稳健时间相关条件控制的必要性。
English
Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced semantic hierarchy across LLM layers and non-stationary denoising dynamics over both diffusion time and network depth. To better match the dynamic process of DiT generation and thereby enhance the diffusion model's generative capability, we introduce a unified normalized convex fusion framework equipped with lightweight gates to systematically organize multi-layer LLM hidden states via time-wise, depth-wise, and joint fusion. Experiments establish Depth-wise Semantic Routing as the superior conditioning strategy, consistently improving text-image alignment and compositional generation (e.g., +9.97 on the GenAI-Bench Counting task). Conversely, we find that purely time-wise fusion can paradoxically degrade visual generation fidelity. We attribute this to a train-inference trajectory mismatch: under classifier-free guidance, nominal timesteps fail to track the effective SNR, causing semantically mistimed feature injection during inference. Overall, our results position depth-wise routing as a strong and effective baseline and highlight the critical need for trajectory-aware signals to enable robust time-dependent conditioning.
PDF231February 6, 2026