DiffTester:基於重複模式加速擴散式大型語言模型的單元測試生成
DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern
September 29, 2025
作者: Lekang Yang, Yuetong Liu, Yitong Zhang, Jia Li
cs.AI
摘要
軟件開發高度依賴於廣泛的單元測試,這使得自動化單元測試生成(UTG)的效率尤為重要。然而,現有的大多數大型語言模型(LLMs)在每次前向傳播時逐個生成測試用例,導致UTG效率低下。最近,擴散式大型語言模型(dLLMs)應運而生,提供了有前景的並行生成能力,並展現出高效UTG的強大潛力。儘管具有這一優勢,但其在UTG中的應用仍受到效率與測試質量之間明顯權衡的限制,因為增加每步生成的標記數量往往會導致測試用例質量急劇下降。為克服這一限制,我們提出了DiffTester,這是一個專為dLLMs在UTG中設計的加速框架。DiffTester的核心思想是,針對同一焦點方法的單元測試通常共享重複的結構模式。通過在生成過程中動態識別這些共同模式,並利用抽象語法樹分析,DiffTester在不影響輸出質量的前提下,自適應地增加每步生成的標記數量。為了實現全面評估,我們擴展了原本僅限於Python的TestEval基準,引入了包括Java和C++在內的更多編程語言。在三個基準上使用兩個代表性模型進行的廣泛實驗表明,DiffTester在保持測試覆蓋率的同時,實現了顯著的加速效果。此外,DiffTester在不同dLLMs和編程語言之間具有良好的泛化能力,為軟件開發中的高效UTG提供了一個實用且可擴展的解決方案。代碼和數據已公開於https://github.com/wellbeingyang/DLM4UTG-open。
English
Software development relies heavily on extensive unit testing, which makes
the efficiency of automated Unit Test Generation (UTG) particularly important.
However, most existing LLMs generate test cases one token at a time in each
forward pass, which leads to inefficient UTG. Recently, diffusion LLMs (dLLMs)
have emerged, offering promising parallel generation capabilities and showing
strong potential for efficient UTG. Despite this advantage, their application
to UTG is still constrained by a clear trade-off between efficiency and test
quality, since increasing the number of tokens generated in each step often
causes a sharp decline in the quality of test cases. To overcome this
limitation, we present DiffTester, an acceleration framework specifically
tailored for dLLMs in UTG. The key idea of DiffTester is that unit tests
targeting the same focal method often share repetitive structural patterns. By
dynamically identifying these common patterns through abstract syntax tree
analysis during generation, DiffTester adaptively increases the number of
tokens produced at each step without compromising the quality of the output. To
enable comprehensive evaluation, we extend the original TestEval benchmark,
which was limited to Python, by introducing additional programming languages
including Java and C++. Extensive experiments on three benchmarks with two
representative models show that DiffTester delivers significant acceleration
while preserving test coverage. Moreover, DiffTester generalizes well across
different dLLMs and programming languages, providing a practical and scalable
solution for efficient UTG in software development. Code and data are publicly
available at https://github.com/wellbeingyang/DLM4UTG-open .