LOGO -- 通过高效偏好优化实现长文本对齐
LOGO -- Long cOntext aliGnment via efficient preference Optimization
October 24, 2024
作者: Zecheng Tang, Zechen Sun, Juntao Li, Qiaoming Zhu, Min Zhang
cs.AI
摘要
长文本模型(LCMs)已展现出在方便且有效地处理长输入序列(甚至超过1亿个标记)方面具有巨大潜力。随着重要进展,最近的研究指出LCMs能够准确地定位上下文中的标记级显著信息。然而,这些LCMs的生成性能远未令人满意,可能导致诸如幻觉之类的错位响应。为增强LCMs的生成能力,现有研究已调查了数据大小和质量对预训练和指导微调的影响。尽管取得了有意义的改进,先前的方法在效果或效率方面存在不足。在本文中,我们介绍了LOGO(Long cOntext aliGnment via efficient preference Optimization),这是一种培训策略,首先引入了用于长上下文对齐的偏好优化。为克服由长序列引起的GPU内存受限问题,LOGO采用了一种无参考偏好优化策略,并采用了位置合成方法来构建训练数据。通过在单个8timesA800 GPU机器上进行16小时的训练,仅使用0.3B数据,LOGO使Llama-3-8B-Instruct-80K模型能够在真实世界的长上下文任务中实现与GPT-4可比的性能,同时保留模型在其他任务(例如语言建模和MMLU)中的原始能力。此外,LOGO可以扩展模型的上下文窗口大小,同时增强其生成性能。
English
Long-context models(LCMs) have shown great potential in processing long input
sequences(even more than 100M tokens) conveniently and effectively. With
significant progress, recent research has pointed out that LCMs can accurately
locate token-level salient information within the context. Yet, the generation
performance of these LCMs is far from satisfactory and might result in
misaligned responses, such as hallucinations. To enhance the generation
capability of LCMs, existing works have investigated the effects of data size
and quality for both pre-training and instruction tuning. Though achieving
meaningful improvement, previous methods fall short in either effectiveness or
efficiency. In this paper, we introduce LOGO(Long cOntext aliGnment via
efficient preference Optimization), a training strategy that first introduces
preference optimization for long-context alignment. To overcome the GPU
memory-bound issue caused by the long sequence, LOGO employs a reference-free
preference optimization strategy and adopts a position synthesis method to
construct the training data. By training with only 0.3B data on a single
8timesA800 GPU machine for 16 hours, LOGO allows the Llama-3-8B-Instruct-80K
model to achieve comparable performance with GPT-4 in real-world long-context
tasks while preserving the model's original capabilities on other tasks, e.g.,
language modeling and MMLU. Moreover, LOGO can extend the model's context
window size while enhancing its generation performance.Summary
AI-Generated Summary