CodeSteer：透過程式碼/文本引導的符號增強語言模型

摘要

現有方法未能有效引導大型語言模型（LLMs）在文本推理和程式碼生成之間進行轉換，使符號計算能力被低效利用。我們引入了CodeSteer，一種有效的方法，用於引導LLM的程式碼/文本生成。我們構建了一個全面的基準SymBench，包括37個具有可調節複雜度的符號任務，並合成了12,000個多輪引導/生成軌跡和5,500個引導比較對的數據集。我們使用新設計的多輪監督微調（SFT）和直接偏好優化（DPO）對Llama-3-8B模型進行微調。結果得到的模型CodeSteerLLM，配備了提出的符號和自我答案檢查器，有效地引導更大型模型的程式碼/文本生成。通過使用CodeSteer來增強GPT-4o，其平均性能得分從53.3提升至86.4，甚至在所有37個任務（28個已見，9個未見）上都優於現有最佳的LLM OpenAI o1（82.7）、o1-preview（74.8）和DeepSeek R1（76.8）。針對GPT-4o進行訓練，CodeSteer展現出卓越的泛化能力，在Claude、Mistral和GPT-3.5上提供平均41.8的性能提升。CodeSteer引導的LLMs充分利用符號計算，在高度複雜的任務上保持強大的性能。模型、數據集和代碼可在以下網址找到：https://github.com/yongchao98/CodeSteer-v1.0。

English

Existing methods fail to effectively steer Large Language Models (LLMs) between textual reasoning and code generation, leaving symbolic computing capabilities underutilized. We introduce CodeSteer, an effective method for guiding LLM code/text generation. We construct a comprehensive benchmark SymBench comprising 37 symbolic tasks with adjustable complexity and also synthesize datasets of 12k multi-round guidance/generation trajectories and 5.5k guidance comparison pairs. We fine-tune the Llama-3-8B model with a newly designed multi-round supervised fine-tuning (SFT) and direct preference optimization (DPO). The resulting model, CodeSteerLLM, augmented with the proposed symbolic and self-answer checkers, effectively guides the code/text generation of larger models. Augmenting GPT-4o with CodeSteer raises its average performance score from 53.3 to 86.4, even outperforming the existing best LLM OpenAI o1 (82.7), o1-preview (74.8), and DeepSeek R1 (76.8) across all 37 tasks (28 seen, 9 unseen). Trained for GPT-4o, CodeSteer demonstrates superior generalizability, providing an average 41.8 performance boost on Claude, Mistral, and GPT-3.5. CodeSteer-guided LLMs fully harness symbolic computing to maintain strong performance on highly complex tasks. Models, Datasets, and Codes are available at https://github.com/yongchao98/CodeSteer-v1.0.

CodeSteer：透過程式碼/文本引導的符號增強語言模型

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

摘要

Support