動的低信頼度マスキングによる適応型クラスフリーガイダンス

要旨

Classifier-Free Guidance（CFG）は、条件付き予測と無条件予測を補間することで、生成モデルの制御性を大幅に向上させます。しかし、標準的なCFGはしばしば静的な無条件入力を使用しており、モデルの不確実性が動的に変化する反復生成プロセスにおいて最適とは言えません。本研究では、Adaptive Classifier-Free Guidance（A-CFG）を提案します。これは、モデルの瞬間的な予測信頼度を活用して無条件入力を調整する新しい手法です。反復的（マスク付き）拡散言語モデルの各ステップにおいて、A-CFGは現在生成されているシーケンス内でモデルの信頼度が低いトークンを特定します。これらのトークンは一時的に再マスクされ、動的で局所的な無条件入力が作成されます。これにより、CFGの補正効果が曖昧な領域に正確に焦点を当て、より効果的なガイダンスが実現されます。A-CFGを最先端のマスク付き拡散言語モデルに統合し、その有効性を実証しました。多様な言語生成ベンチマークでの実験により、A-CFGは標準的なCFGを大幅に上回る改善をもたらし、例えばGPQAでは3.9ポイントの向上を達成しました。本研究は、反復生成におけるモデルの不確実性に応じてガイダンスメカニズムを動的に適応させることの利点を強調しています。

English

Classifier-Free Guidance (CFG) significantly enhances controllability in generative models by interpolating conditional and unconditional predictions. However, standard CFG often employs a static unconditional input, which can be suboptimal for iterative generation processes where model uncertainty varies dynamically. We introduce Adaptive Classifier-Free Guidance (A-CFG), a novel method that tailors the unconditional input by leveraging the model's instantaneous predictive confidence. At each step of an iterative (masked) diffusion language model, A-CFG identifies tokens in the currently generated sequence for which the model exhibits low confidence. These tokens are temporarily re-masked to create a dynamic, localized unconditional input. This focuses CFG's corrective influence precisely on areas of ambiguity, leading to more effective guidance. We integrate A-CFG into a state-of-the-art masked diffusion language model and demonstrate its efficacy. Experiments on diverse language generation benchmarks show that A-CFG yields substantial improvements over standard CFG, achieving, for instance, a 3.9 point gain on GPQA. Our work highlights the benefit of dynamically adapting guidance mechanisms to model uncertainty in iterative generation.

動的低信頼度マスキングによる適応型クラスフリーガイダンス

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

要旨

Support