SURGE:大型語言模型作為通用代碼執行器的潛力
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
February 16, 2025
作者: Bohan Lyu, Siqiao Huang, Zichen Liang
cs.AI
摘要
大型語言模型(LLMs)在與程式碼相關的任務中展現出卓越的能力,如程式碼理解和程式碼生成。然而,一個同等重要但尚未深入探討的問題是,LLMs是否可以作為通用的代碼執行器,預測程序的輸出和行為,而無需實際運行它。為了系統地研究這種能力,我們引入了SURGE,一個包含八個關鍵方面的全面基準測試:多語言編程任務、競賽級別的編程問題、存儲庫級別的代碼分析、高成本科學計算、時間複雜度密集型算法、錯誤代碼分析、依賴於特定編譯器或執行環境的程序,以及正式數學證明驗證。我們在SURGE上評估了多個開源和專有LLMs,並進行了一項規模研究,分析模型大小和訓練數據規模對代碼執行準確性的影響。此外,我們對模型預測錯誤進行分類,並探索潛在的改進領域。我們的研究結果表明,雖然LLMs在某些情況下可以預測代碼執行結果,但在通用代碼執行方面存在局限性。本研究提供了使用LLMs作為代碼執行器的可行性的實證見解。代碼和數據集已在https://github.com/Imbernoulli/SURGE上發布。
English
Large language models (LLMs) have demonstrated remarkable capabilities in
code-related tasks, such as code understanding and code generation. However, an
equally important yet underexplored question is whether LLMs can serve as
general-purpose surrogate code executors, to predict the output and behavior of
a program without actually running it. To systematically investigate this
capability, we introduce SURGE, a comprehensive benchmark covering eight key
aspects: multi-language programming tasks, competition-level programming
problems, repository-level code analysis, high-cost scientific computing,
time-complexity-intensive algorithms, buggy code analysis, programs dependent
on specific compilers or execution environments, and formal mathematical proof
verification. We evaluate multiple open-source and proprietary LLMs on SURGE
and conduct a scaling study to analyze the impact of model size and training
data scale on surrogate execution accuracy. Additionally, we categorize model
prediction errors and explore potential areas for improvement. Our findings
indicate that while LLMs can predict code execution results in certain cases,
they exhibit limitations in general-purpose surrogate execution. This study
provides empirical insights into the feasibility of using LLMs as surrogate
code executors. Code and dataset are released at
https://github.com/Imbernoulli/SURGE.Summary
AI-Generated Summary