ChatPaper.aiChatPaper

更好地配合指示的前後翻譯

Better Alignment with Instruction Back-and-Forth Translation

August 8, 2024
作者: Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li
cs.AI

摘要

我們提出了一種新方法,即指令來回翻譯,以建立基於世界知識的高質量合成數據,用於對齊大型語言模型(LLMs)。給定來自網絡語料庫的文件,我們使用李等人(2023a)提出的反向翻譯方法生成和編輯合成指令,並根據初始文件進一步改寫回應以提高其質量。使用生成的(反向翻譯指令,改寫回應)對進行微調,比使用其他常見指令數據集(如Humpback、ShareGPT、Open Orca、Alpaca-GPT4和Self-instruct)在AlpacaEval上獲得更高的勝率。我們還展示了使用LLM重寫回應優於直接提煉,並且兩個生成的文本分佈在嵌入空間中呈現顯著差異。進一步分析顯示,我們的反向翻譯指令比其他來源的合成指令質量更高,而我們的回應比從提煉獲得的回應更多樣化和複雜。總的來說,我們發現指令來回翻譯結合了網絡上發現的信息多樣性和數量,同時確保了回應的質量,這對於有效對齊是必要的。
English
We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than those obtained from distillation. Overall we find that instruction back-and-forth translation combines the best of both worlds -- making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.

Summary

AI-Generated Summary

PDF163November 28, 2024