LLPut：探索基于缺陷报告的大语言模型输入生成

摘要

导致故障的输入在诊断和分析软件缺陷中扮演着关键角色。缺陷报告通常包含这些输入，开发者会从中提取以辅助调试。由于缺陷报告以自然语言撰写，先前的研究已利用多种自然语言处理（NLP）技术实现自动化输入提取。随着大语言模型（LLMs）的出现，一个重要研究问题随之而来：生成式LLMs从缺陷报告中提取导致故障的输入效果如何？本文提出LLPut技术，旨在通过实验评估三种开源生成式LLMs——LLaMA、Qwen和Qwen-Coder——在从缺陷报告中提取相关输入方面的性能。我们基于包含206份缺陷报告的数据集进行了实验评估，以衡量这些模型的准确性和有效性。我们的研究结果为生成式LLMs在自动化缺陷诊断中的能力与局限提供了深入见解。

English

Failure-inducing inputs play a crucial role in diagnosing and analyzing software bugs. Bug reports typically contain these inputs, which developers extract to facilitate debugging. Since bug reports are written in natural language, prior research has leveraged various Natural Language Processing (NLP) techniques for automated input extraction. With the advent of Large Language Models (LLMs), an important research question arises: how effectively can generative LLMs extract failure-inducing inputs from bug reports? In this paper, we propose LLPut, a technique to empirically evaluate the performance of three open-source generative LLMs -- LLaMA, Qwen, and Qwen-Coder -- in extracting relevant inputs from bug reports. We conduct an experimental evaluation on a dataset of 206 bug reports to assess the accuracy and effectiveness of these models. Our findings provide insights into the capabilities and limitations of generative LLMs in automated bug diagnosis.

LLPut：探索基于缺陷报告的大语言模型输入生成

LLPut: Investigating Large Language Models for Bug Report-Based Input Generation

摘要

Support