基于大语言模型的软件工程问题解决进展与前沿:一项全面综述
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey
January 15, 2026
作者: Caihua Li, Lianghong Guo, Yanlin Wang, Daya Guo, Wei Tao, Zhenyu Shan, Mingwei Liu, Jiachi Chen, Haoyu Song, Duyu Tang, Hongyu Zhang, Zibin Zheng
cs.AI
摘要
问题解决作为一项复杂的软件工程任务,是现实开发中不可或缺的环节,现已成为人工智能领域极具挑战性的研究方向。SWE-bench等基准测试的建立表明,该任务对大型语言模型而言极具难度,这一发现显著加速了自主编程智能体的发展进程。本文系统性地综述了这一新兴领域:首先剖析数据构建流程,涵盖自动化采集与合成方法;继而全面解析技术路径,从包含模块化组件的免训练框架,到基于训练的技术(如监督微调与强化学习);随后探讨数据质量与智能体行为的关键分析,并结合实际应用场景展开论述;最后指出核心挑战并展望未来研究方向。为持续推动该领域发展,我们在https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution 维护开源资源库作为动态知识库。
English
Issue resolution, a complex Software Engineering (SWE) task integral to real-world development, has emerged as a compelling challenge for artificial intelligence. The establishment of benchmarks like SWE-bench revealed this task as profoundly difficult for large language models, thereby significantly accelerating the evolution of autonomous coding agents. This paper presents a systematic survey of this emerging domain. We begin by examining data construction pipelines, covering automated collection and synthesis approaches. We then provide a comprehensive analysis of methodologies, spanning training-free frameworks with their modular components to training-based techniques, including supervised fine-tuning and reinforcement learning. Subsequently, we discuss critical analyses of data quality and agent behavior, alongside practical applications. Finally, we identify key challenges and outline promising directions for future research. An open-source repository is maintained at https://github.com/DeepSoftwareAnalytics/Awesome-Issue-Resolution to serve as a dynamic resource in this field.