アンチ蒸留サンプリング

要旨

拡張的な推論トレースを生成するフロンティアモデルは、意図せずにモデル蒸餾を促進する豊富なトークンシーケンスを生成してしまう。この脆弱性を認識したモデル所有者は、モデルの性能を損なうことなく蒸餾の効果を制限するサンプリング戦略を模索する可能性がある。アンチ蒸餾サンプリングはまさにこの機能を提供する。モデルの次トークン確率分布を戦略的に変更することで、アンチ蒸餾サンプリングは推論トレースを汚染し、蒸餾の効果を大幅に低下させながらもモデルの実用性を維持する。詳細については、https://antidistillation.com を参照のこと。

English

Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.

アンチ蒸留サンプリング

Antidistillation Sampling

要旨

Support