TESS 2:大規模通用擴散語言模型
TESS 2: A Large-Scale Generalist Diffusion Language Model
February 19, 2025
作者: Jaesung Tae, Hamish Ivison, Sachin Kumar, Arman Cohan
cs.AI
摘要
我們推出TESS 2,這是一個通用的指令遵循擴散語言模型,其表現超越了當代的指令調優擴散模型,並在某些情況下與強大的自回歸(AR)模型相媲美甚至超越。我們首先通過持續預訓練來適應一個強大的AR模型,使用常規的交叉熵作為擴散損失,然後進行進一步的指令調優,從而訓練TESS 2。我們發現,適應訓練以及基礎模型的選擇對於訓練良好的指令遵循擴散模型至關重要。我們進一步提出了獎勵引導,這是一種新穎且模塊化的推理時引導程序,用於對齊模型輸出,而無需訓練底層模型。最後,我們展示了TESS 2隨著推理時計算資源的增加而進一步提升,這凸顯了擴散語言模型在推理時對計算量進行細粒度控制的實用性。代碼和模型可在https://github.com/hamishivi/tess-2獲取。
English
We introduce TESS 2, a general instruction-following diffusion language model
that outperforms contemporary instruction-tuned diffusion models, as well as
matches and sometimes exceeds strong autoregressive (AR) models. We train TESS
2 by first adapting a strong AR model via continued pretraining with the usual
cross-entropy as diffusion loss, and then performing further instruction
tuning. We find that adaptation training as well as the choice of the base
model is crucial for training good instruction-following diffusion models. We
further propose reward guidance, a novel and modular inference-time guidance
procedure to align model outputs without needing to train the underlying model.
Finally, we show that TESS 2 further improves with increased inference-time
compute, highlighting the utility of diffusion LMs in having fine-grained
controllability over the amount of compute used at inference time. Code and
models are available at https://github.com/hamishivi/tess-2.Summary
AI-Generated Summary