DiffTester: 反復パターンによる拡散LLM向け単体テスト生成の高速化

要旨

ソフトウェア開発において、広範な単体テストは不可欠であり、自動化された単体テスト生成（UTG）の効率性は特に重要である。しかし、既存の大規模言語モデル（LLM）の多くは、各フォワードパスで一度に1トークンずつテストケースを生成するため、UTGの効率が低い。最近、拡散型LLM（dLLM）が登場し、並列生成能力を提供し、効率的なUTGへの強い可能性を示している。この利点にもかかわらず、UTGへの適用は、効率とテスト品質の間の明確なトレードオフによって制約されている。各ステップで生成されるトークン数を増やすと、テストケースの品質が急激に低下するためである。この制限を克服するために、我々はdLLMに特化した加速フレームワークであるDiffTesterを提案する。DiffTesterの鍵となるアイデアは、同じ焦点メソッドを対象とする単体テストが、しばしば繰り返しの構造パターンを共有するという点である。生成中に抽象構文木解析を通じてこれらの共通パターンを動的に識別することで、DiffTesterは出力品質を損なうことなく、各ステップで生成されるトークン数を適応的に増加させる。包括的な評価を可能にするため、Pythonに限定されていた元のTestEvalベンチマークを拡張し、JavaやC++などの追加プログラミング言語を導入した。2つの代表的なモデルを用いた3つのベンチマークでの広範な実験により、DiffTesterがテストカバレッジを維持しながら大幅な加速を実現することが示された。さらに、DiffTesterは異なるdLLMやプログラミング言語間で良好に汎化し、ソフトウェア開発における効率的なUTGのための実用的でスケーラブルなソリューションを提供する。コードとデータはhttps://github.com/wellbeingyang/DLM4UTG-openで公開されている。

English

Software development relies heavily on extensive unit testing, which makes the efficiency of automated Unit Test Generation (UTG) particularly important. However, most existing LLMs generate test cases one token at a time in each forward pass, which leads to inefficient UTG. Recently, diffusion LLMs (dLLMs) have emerged, offering promising parallel generation capabilities and showing strong potential for efficient UTG. Despite this advantage, their application to UTG is still constrained by a clear trade-off between efficiency and test quality, since increasing the number of tokens generated in each step often causes a sharp decline in the quality of test cases. To overcome this limitation, we present DiffTester, an acceleration framework specifically tailored for dLLMs in UTG. The key idea of DiffTester is that unit tests targeting the same focal method often share repetitive structural patterns. By dynamically identifying these common patterns through abstract syntax tree analysis during generation, DiffTester adaptively increases the number of tokens produced at each step without compromising the quality of the output. To enable comprehensive evaluation, we extend the original TestEval benchmark, which was limited to Python, by introducing additional programming languages including Java and C++. Extensive experiments on three benchmarks with two representative models show that DiffTester delivers significant acceleration while preserving test coverage. Moreover, DiffTester generalizes well across different dLLMs and programming languages, providing a practical and scalable solution for efficient UTG in software development. Code and data are publicly available at https://github.com/wellbeingyang/DLM4UTG-open .

DiffTester: 反復パターンによる拡散LLM向け単体テスト生成の高速化

DiffTester: Accelerating Unit Test Generation for Diffusion LLMs via Repetitive Pattern

要旨

Support