Rule2DRC: 実行誘導型テスト生成を用いたDRCスクリプト合成のためのLLMエージェントベンチマーク

要旨

製造可能なチップレイアウトは数千もの形状ベースの設計ルールを満たす必要があり、設計ルールチェック（DRC）は、レイアウト上で実行可能なDRCスクリプトを実行することでこれを強制する。自然言語のルールを正しいDRCスクリプトに翻訳する作業は労力がかかり、専門的な知識を要するため、DRCスクリプトの合成とデバッグにLLMエージェントを活用する動機となっている。しかし、既存のベンチマークは評価セットが小さく、スクリプトを実行の正確性ではなくコード類似性で評価することが多く、これまでの機械学習ベースの手法は実行フィードバックを無視するか、エージェントの入力としてラベル付きテストレイアウトを必要としていた。そこで我々は、Rule2DRCを導入する。これはDRCスクリプトコーディングエージェント向けの大規模ベンチマークであり、1,000のルール対スクリプトタスクと、実行ベースのスコアリングのための13,921の評価用チップレイアウトを提供する。Rule2DRCは、エージェントへの入力として評価レイアウトを必要とせずに、DRC実行結果を通じて機能的正当性を測定する評価パイプラインを提供する。また、実行フィードバックを利用して識別力の高いテストケースを生成し、従来は区別できなかった候補スクリプトを分離することで、この領域におけるBest-of-N選択性能を大幅に向上させるテスターエージェントであるSplitTesterも提案する。コードはhttps://github.com/snu-mllab/Rule2DRCで公開している。

English

Manufacturable chip layouts must satisfy thousands of geometry-based design rules, and design rule checking (DRC) enforces them by running executable DRC scripts on layouts. Translating natural language rules into correct DRC scripts is labor-intensive and requires specialized expertise, motivating LLM agents for DRC script synthesis and debugging. However, existing benchmarks have small evaluation sets and often evaluate scripts by code similarity rather than execution correctness, and prior machine learning-based methods either ignore execution feedback or require labeled test layouts as agent's input. To this end, we introduce Rule2DRC, a large-scale benchmark for DRC script coding agents with 1,000 rule-to-script tasks and 13,921 evaluation chip layouts for execution-based scoring. Rule2DRC provides an evaluation pipeline that measures functional correctness via DRC execution outcomes without requiring evaluation layouts as input to the agent. We also propose SplitTester, a tester agent for program selection that uses execution feedback to generate discriminative test cases and separate previously indistinguishable candidate scripts, substantially improving Best-of-N selection performance in this domain. We release the code at https://github.com/snu-mllab/Rule2DRC.