TESS 2：大規模通用擴散語言模型

摘要

我們推出TESS 2，這是一個通用的指令遵循擴散語言模型，其表現超越了當代的指令調優擴散模型，並在某些情況下與強大的自回歸（AR）模型相媲美甚至超越。我們首先通過持續預訓練來適應一個強大的AR模型，使用常規的交叉熵作為擴散損失，然後進行進一步的指令調優，從而訓練TESS 2。我們發現，適應訓練以及基礎模型的選擇對於訓練良好的指令遵循擴散模型至關重要。我們進一步提出了獎勵引導，這是一種新穎且模塊化的推理時引導程序，用於對齊模型輸出，而無需訓練底層模型。最後，我們展示了TESS 2隨著推理時計算資源的增加而進一步提升，這凸顯了擴散語言模型在推理時對計算量進行細粒度控制的實用性。代碼和模型可在https://github.com/hamishivi/tess-2獲取。

English

We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We further propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model. Finally, we show that TESS 2 further improves with increased inference-time compute, highlighting the utility of diffusion LMs in having fine-grained controllability over the amount of compute used at inference time. Code and models are available at https://github.com/hamishivi/tess-2.

TESS 2：大規模通用擴散語言模型

TESS 2: A Large-Scale Generalist Diffusion Language Model

摘要

Support