InverseCoder:透過Inverse-Instruct釋放指令調整的代碼LLM的潛力
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
July 8, 2024
作者: Yutong Wu, Di Huang, Wenxuan Shi, Wei Wang, Lingzhe Gao, Shihao Liu, Ziyuan Nan, Kaizhao Yuan, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Yewen Pu, Dawei Yin, Xing Hu, Yunji Chen
cs.AI
摘要
最近開源代碼大型語言模型(LLMs)的進展展示了通過在強大的封閉源LLMs(如GPT-3.5和GPT-4)生成的數據上進行微調,具有卓越的編碼能力,用於指令調整。本文探討如何通過從自身生成數據而不是查詢封閉源LLMs來進一步改進指令調整的代碼LLM。我們的關鍵觀察是正式語言和非正式語言之間的翻譯不一致:將正式語言(即代碼)翻譯為非正式語言(即自然語言)比反向操作更為簡單。基於這一觀察,我們提出了INVERSE-INSTRUCT,它從代碼片段中總結指令而不是相反。具體而言,給定用於代碼的指令調整語料庫和生成的指令調整代碼LLM,我們要求代碼LLM通過代碼總結和自我評估為原始語料庫生成額外的高質量指令。然後,我們對基礎LLM進行微調,使其結合原始語料庫和自行生成的語料庫,從而產生更強大的指令調整LLM。我們提出了一系列名為InverseCoder的代碼LLMs,它在廣泛的基準測試中超越了原始代碼LLMs的性能,包括Python文本轉代碼生成、多語言編碼和數據科學代碼生成。
English
Recent advancements in open-source code large language models (LLMs) have
demonstrated remarkable coding abilities by fine-tuning on the data generated
from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction
tuning. This paper explores how to further improve an instruction-tuned code
LLM by generating data from itself rather than querying closed-source LLMs. Our
key observation is the misalignment between the translation of formal and
informal languages: translating formal language (i.e., code) to informal
language (i.e., natural language) is more straightforward than the reverse.
Based on this observation, we propose INVERSE-INSTRUCT, which summarizes
instructions from code snippets instead of the reverse. Specifically, given an
instruction tuning corpus for code and the resulting instruction-tuned code
LLM, we ask the code LLM to generate additional high-quality instructions for
the original corpus through code summarization and self-evaluation. Then, we
fine-tune the base LLM on the combination of the original corpus and the
self-generated one, which yields a stronger instruction-tuned LLM. We present a
series of code LLMs named InverseCoder, which surpasses the performance of the
original code LLMs on a wide range of benchmarks, including Python text-to-code
generation, multilingual coding, and data-science code generation.Summary
AI-Generated Summary