LOGO -- 通过高效偏好优化实现长文本对齐

摘要

长文本模型（LCMs）已展现出在方便且有效地处理长输入序列（甚至超过1亿个标记）方面具有巨大潜力。随着重要进展，最近的研究指出LCMs能够准确地定位上下文中的标记级显著信息。然而，这些LCMs的生成性能远未令人满意，可能导致诸如幻觉之类的错位响应。为增强LCMs的生成能力，现有研究已调查了数据大小和质量对预训练和指导微调的影响。尽管取得了有意义的改进，先前的方法在效果或效率方面存在不足。在本文中，我们介绍了LOGO（Long cOntext aliGnment via efficient preference Optimization），这是一种培训策略，首先引入了用于长上下文对齐的偏好优化。为克服由长序列引起的GPU内存受限问题，LOGO采用了一种无参考偏好优化策略，并采用了位置合成方法来构建训练数据。通过在单个8timesA800 GPU机器上进行16小时的训练，仅使用0.3B数据，LOGO使Llama-3-8B-Instruct-80K模型能够在真实世界的长上下文任务中实现与GPT-4可比的性能，同时保留模型在其他任务（例如语言建模和MMLU）中的原始能力。此外，LOGO可以扩展模型的上下文窗口大小，同时增强其生成性能。

English

Long-context models(LCMs) have shown great potential in processing long input sequences(even more than 100M tokens) conveniently and effectively. With significant progress, recent research has pointed out that LCMs can accurately locate token-level salient information within the context. Yet, the generation performance of these LCMs is far from satisfactory and might result in misaligned responses, such as hallucinations. To enhance the generation capability of LCMs, existing works have investigated the effects of data size and quality for both pre-training and instruction tuning. Though achieving meaningful improvement, previous methods fall short in either effectiveness or efficiency. In this paper, we introduce LOGO(Long cOntext aliGnment via efficient preference Optimization), a training strategy that first introduces preference optimization for long-context alignment. To overcome the GPU memory-bound issue caused by the long sequence, LOGO employs a reference-free preference optimization strategy and adopts a position synthesis method to construct the training data. By training with only 0.3B data on a single 8timesA800 GPU machine for 16 hours, LOGO allows the Llama-3-8B-Instruct-80K model to achieve comparable performance with GPT-4 in real-world long-context tasks while preserving the model's original capabilities on other tasks, e.g., language modeling and MMLU. Moreover, LOGO can extend the model's context window size while enhancing its generation performance.

LOGO -- 通过高效偏好优化实现长文本对齐

LOGO -- Long cOntext aliGnment via efficient preference Optimization

摘要

Support