SelfCite: 大規模言語モデルにおける文脈帰属のための自己教師付きアライメント

要旨

SelfCiteは、LLMを整列させて、生成された応答の文レベルの引用を高品質かつ細かく生成する革新的な自己教師付きアプローチを紹介します。高額かつ労力を要する注釈にのみ依存するのではなく、SelfCiteはLLM自体によって提供される報酬信号を活用します。これはコンテキストの欠如を通じて行われ、引用が必要な場合、引用されたテキストをコンテキストから削除することで同じ応答を防ぐべきであり、十分な場合は引用されたテキストだけを保持することで同じ応答を維持すべきであるという考え方です。この報酬は、推論時のベストオブNサンプリング戦略を導くことで引用の品質を大幅に向上させるだけでなく、好みの最適化にも使用でき、モデルを直接微調整してより良い引用を生成するために利用できます。SelfCiteの効果は、5つの長文形式の質問応答タスク全体で、LongBench-Citeベンチマークにおける引用F1を最大5.3ポイント向上させることで示されています。

English

We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. Instead of only relying on costly and labor-intensive annotations, SelfCite leverages a reward signal provided by the LLM itself through context ablation: If a citation is necessary, removing the cited text from the context should prevent the same response; if sufficient, retaining the cited text alone should preserve the same response. This reward can guide the inference-time best-of-N sampling strategy to improve citation quality significantly, as well as be used in preference optimization to directly fine-tune the models for generating better citations. The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark across five long-form question answering tasks.

SelfCite: 大規模言語モデルにおける文脈帰属のための自己教師付きアライメント

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

要旨

Support