Rule2DRC: 실행 기반 테스트 생성을 통한 DRC 스크립트 합성을 위한 LLM 에이전트 벤치마킹

초록

제조 가능한 칩 레이아웃은 수천 개의 기하학 기반 설계 규칙을 충족해야 하며, 설계 규칙 검사(DRC)는 레이아웃에 대해 실행 가능한 DRC 스크립트를 실행하여 이를 강제한다. 자연어 규칙을 올바른 DRC 스크립트로 변환하는 것은 노동 집약적이며 전문적인 지식이 필요하므로, DRC 스크립트 합성 및 디버깅을 위한 LLM 에이전트의 동기가 된다. 그러나 기존 벤치마크는 평가 세트가 작고, 종종 실행 정확성보다는 코드 유사성으로 스크립트를 평가하며, 이전 머신러닝 기반 방법은 실행 피드백을 무시하거나 에이전트의 입력으로 레이블이 지정된 테스트 레이아웃을 요구한다. 이에 우리는 1,000개의 규칙-스크립트 작업과 실행 기반 점수를 위한 13,921개의 평가 칩 레이아웃을 포함하는 DRC 스크립트 코딩 에이전트를 위한 대규모 벤치마크인 Rule2DRC를 소개한다. Rule2DRC는 에이전트의 입력으로 평가 레이아웃을 필요로 하지 않으면서 DRC 실행 결과를 통해 기능적 정확성을 측정하는 평가 파이프라인을 제공한다. 또한 우리는 실행 피드백을 사용하여 식별력 있는 테스트 케이스를 생성하고 이전에 구별 불가능했던 후보 스크립트를 분리하는 프로그램 선택을 위한 테스터 에이전트인 SplitTester를 제안하여, 이 분야에서 Best-of-N 선택 성능을 크게 향상시킨다. 코드는 https://github.com/snu-mllab/Rule2DRC에서 공개한다.

English

Manufacturable chip layouts must satisfy thousands of geometry-based design rules, and design rule checking (DRC) enforces them by running executable DRC scripts on layouts. Translating natural language rules into correct DRC scripts is labor-intensive and requires specialized expertise, motivating LLM agents for DRC script synthesis and debugging. However, existing benchmarks have small evaluation sets and often evaluate scripts by code similarity rather than execution correctness, and prior machine learning-based methods either ignore execution feedback or require labeled test layouts as agent's input. To this end, we introduce Rule2DRC, a large-scale benchmark for DRC script coding agents with 1,000 rule-to-script tasks and 13,921 evaluation chip layouts for execution-based scoring. Rule2DRC provides an evaluation pipeline that measures functional correctness via DRC execution outcomes without requiring evaluation layouts as input to the agent. We also propose SplitTester, a tester agent for program selection that uses execution feedback to generate discriminative test cases and separate previously indistinguishable candidate scripts, substantially improving Best-of-N selection performance in this domain. We release the code at https://github.com/snu-mllab/Rule2DRC.