利用前缀共享加速直接偏好优化

摘要

离线配对偏好优化算法已成为微调偏好数据的流行方法，在各种任务中表现优于传统的监督式微调。然而，传统实现通常涉及冗余计算，特别是对于具有长共享提示的任务。我们引入了用于偏好微调的前缀共享，这是一种新颖的技术，它将选择和拒绝的响应作为具有共享前缀的一个序列进行处理。为了防止跨响应污染，我们使用自定义的块稀疏注意力掩码。我们的方法在流行的DPO数据集上实现了1.1-1.5倍的训练吞吐量改进，而不会对收敛产生任何影响。当与序列打包结合时，我们观察到一致的1.3-1.6倍加速，甚至有助于具有较小序列长度的数据集。虽然我们专注于直接偏好优化（DPO），但我们的方法适用于其他配对偏好微调方法。通过提高计算效率，我们的工作有助于使基于偏好的微调更易于应用于更广泛的应用和模型规模。我们在https://github.com/frankxwang/dpo-prefix-sharing上开源我们的代码。

English

Offline paired preference optimization algorithms have become a popular approach for fine-tuning on preference data, outperforming traditional supervised fine-tuning in various tasks. However, traditional implementations often involve redundant computations, especially for tasks with long shared prompts. We introduce prefix sharing for preference tuning, a novel technique that processes chosen and rejected responses as one sequence with a shared prefix. To prevent cross-response contamination, we use a custom block-sparse attention mask. Our method achieves 1.1-1.5times improvement in training throughput on popular DPO datasets, without any effect on convergence. When combined with sequence packing, we observe consistent 1.3-1.6times speedups, benefiting even datasets with smaller sequence lengths. While we focus on Direct Preference Optimization (DPO), our approach is applicable to other paired preference tuning methods. By enhancing computational efficiency, our work contributes to making preference-based fine-tuning more accessible for a wider range of applications and model sizes. We open-source our code at https://github.com/frankxwang/dpo-prefix-sharing.

利用前缀共享加速直接偏好优化

Accelerating Direct Preference Optimization with Prefix Sharing

摘要

Support