基於動態低置信度掩碼的自適應無分類器引導

摘要

無分類器指導（Classifier-Free Guidance, CFG）通過融合條件與非條件預測，顯著提升了生成模型的可控性。然而，標準的CFG通常採用靜態的非條件輸入，這在模型不確定性動態變化的迭代生成過程中可能並非最優。我們提出了一種新方法——自適應無分類器指導（Adaptive Classifier-Free Guidance, A-CFG），該方法利用模型的即時預測置信度來定制非條件輸入。在迭代（掩碼）擴散語言模型的每一步中，A-CFG識別當前生成序列中模型表現出低置信度的詞元，並暫時重新掩碼這些詞元，以創建一個動態的、局部化的非條件輸入。這使得CFG的校正影響精確聚焦於模糊區域，從而實現更有效的指導。我們將A-CFG集成到最先進的掩碼擴散語言模型中，並展示了其有效性。在多樣化的語言生成基準測試中，實驗表明A-CFG相較於標準CFG帶來了顯著的改進，例如在GPQA上取得了3.9分的提升。我們的工作強調了在迭代生成中根據模型不確定性動態調整指導機制的重要性。

English

Classifier-Free Guidance (CFG) significantly enhances controllability in generative models by interpolating conditional and unconditional predictions. However, standard CFG often employs a static unconditional input, which can be suboptimal for iterative generation processes where model uncertainty varies dynamically. We introduce Adaptive Classifier-Free Guidance (A-CFG), a novel method that tailors the unconditional input by leveraging the model's instantaneous predictive confidence. At each step of an iterative (masked) diffusion language model, A-CFG identifies tokens in the currently generated sequence for which the model exhibits low confidence. These tokens are temporarily re-masked to create a dynamic, localized unconditional input. This focuses CFG's corrective influence precisely on areas of ambiguity, leading to more effective guidance. We integrate A-CFG into a state-of-the-art masked diffusion language model and demonstrate its efficacy. Experiments on diverse language generation benchmarks show that A-CFG yields substantial improvements over standard CFG, achieving, for instance, a 3.9 point gain on GPQA. Our work highlights the benefit of dynamically adapting guidance mechanisms to model uncertainty in iterative generation.

基於動態低置信度掩碼的自適應無分類器引導

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

摘要

Support