ChatPaper.aiChatPaper

动态大型概念模型:自适应语义空间中的潜在推理

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

December 31, 2025
作者: Xingwei Qu, Shaowen Wang, Zihao Huang, Kai Hua, Fan Yin, Rui-Jie Zhu, Jundong Zhou, Qiyang Min, Zihao Wang, Yizhi Li, Tianyu Zhang, He Xing, Zheng Zhang, Yuxuan Song, Tianyu Zheng, Zhiyuan Zeng, Chenghua Lin, Ge Zhang, Wenhao Huang
cs.AI

摘要

大型语言模型(LLMs)对全部词元采用统一计算模式,然而语言本身却呈现出高度非均匀的信息密度特性。这种词元均质化处理机制既在局部可预测片段上浪费了计算容量,又对语义关键跃迁区域分配不足。我们提出动态大概念模型(DLCM),该分层语言建模框架能够从潜在表征中学习语义边界,并将计算资源从词元空间转移到压缩后的概念空间——后者具备更高效的推理能力。DLCM通过端到端方式自主发现可变长度概念,无需依赖预定义语言单元。分层压缩机制从根本上改变了模型的扩展规律:我们首次提出感知压缩的扩展定律,将词元级容量、概念级推理能力与压缩比进行解耦,从而在固定浮点运算次数下实现理论驱动的计算资源分配。为稳定训练这种异构架构,我们进一步开发解耦型μP参数化方法,支持在不同宽度和压缩机制间进行零样本超参数迁移。在实用场景设置下(压缩比R=4,即每个概念平均对应四个词元),DLCM将约三分之一的推理计算量重新分配给高容量推理主干网络,在保持等效推理浮点运算量的前提下,于12个零样本基准测试中实现平均+2.69%的性能提升。
English
Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose Dynamic Large Concept Models (DLCM), a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first compression-aware scaling law, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a decoupled μP parametrization that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting (R=4, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a +2.69\% average improvement across 12 zero-shot benchmarks under matched inference FLOPs.
PDF242January 3, 2026