SelfCite: Alineación Auto-supervisada para Atribución de Contexto en Modelos de Lenguaje Grandes

Resumen

Presentamos SelfCite, un enfoque novedoso de autoaprendizaje que alinea LLMs para generar citas a nivel de oración de alta calidad y detalladas para las afirmaciones en sus respuestas generadas. En lugar de depender únicamente de anotaciones costosas y laboriosas, SelfCite aprovecha una señal de recompensa proporcionada por el LLM a través de la ablación de contexto: si es necesaria una cita, eliminar el texto citado del contexto debería evitar la misma respuesta; si es suficiente, mantener solo el texto citado debería preservar la misma respuesta. Esta recompensa puede guiar la estrategia de muestreo de mejor de N en tiempo de inferencia para mejorar significativamente la calidad de las citas, así como utilizarse en la optimización de preferencias para ajustar directamente los modelos para generar mejores citas. La efectividad de SelfCite se demuestra al aumentar el F1 de la cita hasta 5.3 puntos en el banco de pruebas LongBench-Cite en cinco tareas de respuesta a preguntas de formato largo.

English

We introduce SelfCite, a novel self-supervised approach that aligns LLMs to generate high-quality, fine-grained, sentence-level citations for the statements in their generated responses. Instead of only relying on costly and labor-intensive annotations, SelfCite leverages a reward signal provided by the LLM itself through context ablation: If a citation is necessary, removing the cited text from the context should prevent the same response; if sufficient, retaining the cited text alone should preserve the same response. This reward can guide the inference-time best-of-N sampling strategy to improve citation quality significantly, as well as be used in preference optimization to directly fine-tune the models for generating better citations. The effectiveness of SelfCite is demonstrated by increasing citation F1 up to 5.3 points on the LongBench-Cite benchmark across five long-form question answering tasks.

SelfCite: Alineación Auto-supervisada para Atribución de Contexto en Modelos de Lenguaje Grandes

SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Resumen

Support