LLPut:探索基于缺陷报告的大语言模型输入生成
LLPut: Investigating Large Language Models for Bug Report-Based Input Generation
March 26, 2025
作者: Alif Al Hasan, Subarna Saha, Mia Mohammad Imran, Tarannum Shaila Zaman
cs.AI
摘要
导致故障的输入在诊断和分析软件缺陷中扮演着关键角色。缺陷报告通常包含这些输入,开发者会从中提取以辅助调试。由于缺陷报告以自然语言撰写,先前的研究已利用多种自然语言处理(NLP)技术实现自动化输入提取。随着大语言模型(LLMs)的出现,一个重要研究问题随之而来:生成式LLMs从缺陷报告中提取导致故障的输入效果如何?本文提出LLPut技术,旨在通过实验评估三种开源生成式LLMs——LLaMA、Qwen和Qwen-Coder——在从缺陷报告中提取相关输入方面的性能。我们基于包含206份缺陷报告的数据集进行了实验评估,以衡量这些模型的准确性和有效性。我们的研究结果为生成式LLMs在自动化缺陷诊断中的能力与局限提供了深入见解。
English
Failure-inducing inputs play a crucial role in diagnosing and analyzing
software bugs. Bug reports typically contain these inputs, which developers
extract to facilitate debugging. Since bug reports are written in natural
language, prior research has leveraged various Natural Language Processing
(NLP) techniques for automated input extraction. With the advent of Large
Language Models (LLMs), an important research question arises: how effectively
can generative LLMs extract failure-inducing inputs from bug reports? In this
paper, we propose LLPut, a technique to empirically evaluate the performance of
three open-source generative LLMs -- LLaMA, Qwen, and Qwen-Coder -- in
extracting relevant inputs from bug reports. We conduct an experimental
evaluation on a dataset of 206 bug reports to assess the accuracy and
effectiveness of these models. Our findings provide insights into the
capabilities and limitations of generative LLMs in automated bug diagnosis.Summary
AI-Generated Summary