通過隱藏半自迴歸專家實現擴散式大語言模型的測試時縮放

摘要

基於擴散的大型語言模型（dLLMs）在訓練時被靈活地設計來模擬數據分佈中的極端依賴性；然而，在推理階段如何最佳利用這些信息仍是一個未解之題。在本研究中，我們揭示了這些模型的一個有趣特性：在文本數據上訓練的dLLMs隱含地學習了一組半自迴歸專家的混合體，其中不同的生成順序展現出不同的專業行為。我們指出，常見的固定推理時間調度做法，因未能利用這一潛在的集成，導致性能下降。為解決此問題，我們引入了HEX（隱藏半自迴歸專家用於測試時擴展），這是一種無需額外訓練的推理方法，它通過異構塊調度進行集成。通過對多樣化塊大小生成路徑進行多數表決，HEX穩健地避免了與任何單一固定調度相關的失敗模式。在如GSM8K等推理基準測試中，它將準確率提升至多3.56倍（從24.72%提升至88.10%），超越了Top-K邊際推理及如GRPO等專門微調方法，且無需額外訓練。HEX還在MATH基準測試中從16.40%提升至40.00%，在ARC-C的科學推理中從54.18%提升至87.80%，以及在TruthfulQA中從28.36%提升至57.46%，均取得了顯著進步。我們的成果為基於擴散的LLMs（dLLMs）的測試時擴展確立了新的範式，揭示了掩碼操作的順序在推理過程中對性能起著決定性作用。

English

Diffusion-based large language models (dLLMs) are trained flexibly to model extreme dependence in the data distribution; however, how to best utilize this information at inference time remains an open problem. In this work, we uncover an interesting property of these models: dLLMs trained on textual data implicitly learn a mixture of semi-autoregressive experts, where different generation orders reveal different specialized behaviors. We show that committing to any single, fixed inference time schedule, a common practice, collapses performance by failing to leverage this latent ensemble. To address this, we introduce HEX (Hidden semiautoregressive EXperts for test-time scaling), a training-free inference method that ensembles across heterogeneous block schedules. By doing a majority vote over diverse block-sized generation paths, HEX robustly avoids failure modes associated with any single fixed schedule. On reasoning benchmarks such as GSM8K, it boosts accuracy by up to 3.56X (from 24.72% to 88.10%), outperforming top-K margin inference and specialized fine-tuned methods like GRPO, without additional training. HEX even yields significant gains on MATH benchmark from 16.40% to 40.00%, scientific reasoning on ARC-C from 54.18% to 87.80%, and TruthfulQA from 28.36% to 57.46%. Our results establish a new paradigm for test-time scaling in diffusion-based LLMs (dLLMs), revealing that the sequence in which masking is performed plays a critical role in determining performance during inference.

通過隱藏半自迴歸專家實現擴散式大語言模型的測試時縮放

Test-Time Scaling in Diffusion LLMs via Hidden Semi-Autoregressive Experts

摘要

Support