ChatPaper.aiChatPaper

动态大型概念模型:自适应语义空间中的潜在推理

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

December 31, 2025
作者: Xingwei Qu, Shaowen Wang, Zihao Huang, Kai Hua, Fan Yin, Rui-Jie Zhu, Jundong Zhou, Qiyang Min, Zihao Wang, Yizhi Li, Tianyu Zhang, He Xing, Zheng Zhang, Yuxuan Song, Tianyu Zheng, Zhiyuan Zeng, Chenghua Lin, Ge Zhang, Wenhao Huang
cs.AI

摘要

大型语言模型(LLMs)对所有词元采用统一计算模式,然而语言实际呈现出高度非均匀的信息密度特性。这种词元均质化处理机制既在局部可预测片段上浪费算力,又对语义关键转折处的计算资源分配不足。我们提出动态大概念模型(DLCM)——一种分层语言建模框架,该框架通过潜在表征自动学习语义边界,并将计算重心从词元转移至压缩后的概念空间,从而提升推理效率。DLCM采用端到端方式发现可变长度概念,无需依赖预定义语言单元。分层压缩机制从根本上改变了模型的扩展规律:我们首次提出压缩感知的缩放定律,将词元级容量、概念级推理能力与压缩比进行解耦,实现在固定浮点运算量下的理性计算分配。为稳定训练这一异构架构,我们进一步开发解耦型μP参数化方案,支持跨宽度与压缩模式的零样本超参数迁移。在实用场景下(压缩比R=4,即每个概念平均对应四个词元),DLCM将约三分之一的推理算力重新分配给高容量推理主干网络,在保持推理浮点运算量不变的条件下,于12个零样本基准测试中实现平均+2.69%的性能提升。
English
Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose Dynamic Large Concept Models (DLCM), a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first compression-aware scaling law, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a decoupled μP parametrization that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting (R=4, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a +2.69\% average improvement across 12 zero-shot benchmarks under matched inference FLOPs.
PDF242January 3, 2026