QUASAR: 에이전트 강화 학습을 통한 도구-보조 대형 언어 모델 기반 양자 어셈블리 코드 생성

초록

양자 컴퓨팅의 이점을 활용하기 위해서는 작업 특화 양자 회로를 설계하고 최적화하는 것이 중요하다. 최근 대규모 언어 모델(LLM) 기반 양자 회로 생성이 유망한 자동화 솔루션으로 부상하고 있다. 그러나 근본적인 문제들은 여전히 해결되지 않은 상태이다: (i) 매개변수화된 양자 게이트는 최적의 성능을 위해 정확한 수치를 요구하며, 이는 양자 게이트의 수, 매개변수, 회로의 레이아웃/깊이 등 여러 요소에 따라 달라진다. (ii) LLM은 양자 도메인 특화 지식의 부족으로 인해 종종 저품질이거나 잘못된 양자 회로를 생성한다. 우리는 도구 강화 LLM 기반의 양자 회로 생성 및 최적화를 위한 강화 학습(RL) 프레임워크인 QUASAR를 제안한다. QUASAR는 LLM을 양자 특화 지식과 일치시키고 생성된 양자 회로를 개선하기 위해 (i) 외부 양자 시뮬레이터를 활용한 양자 회로 검증 접근법과 (ii) RL 훈련에서의 정교한 계층적 보상 메커니즘을 설계한다. 광범위한 평가를 통해 생성된 양자 회로의 구문 및 의미론적 성능이 개선되었음을 보여준다. 4B LLM을 강화할 때, QUASAR는 Pass@1에서 99.31%, Pass@10에서 100%의 유효성을 달성하여 GPT-4o, GPT-5, DeepSeek-V3와 같은 산업용 LLM 및 여러 지도 미세 조정(SFT) 전용 및 RL 전용 베이스라인을 능가했다.

English

Designing and optimizing task-specific quantum circuits are crucial to leverage the advantage of quantum computing. Recent large language model (LLM)-based quantum circuit generation has emerged as a promising automatic solution. However, the fundamental challenges remain unaddressed: (i) parameterized quantum gates require precise numerical values for optimal performance, which also depend on multiple aspects, including the number of quantum gates, their parameters, and the layout/depth of the circuits. (ii) LLMs often generate low-quality or incorrect quantum circuits due to the lack of quantum domain-specific knowledge. We propose QUASAR, an agentic reinforcement learning (RL) framework for quantum circuits generation and optimization based on tool-augmented LLMs. To align the LLM with quantum-specific knowledge and improve the generated quantum circuits, QUASAR designs (i) a quantum circuit verification approach with external quantum simulators and (ii) a sophisticated hierarchical reward mechanism in RL training. Extensive evaluation shows improvements in both syntax and semantic performance of the generated quantum circuits. When augmenting a 4B LLM, QUASAR has achieved the validity of 99.31% in Pass@1 and 100% in Pass@10, outperforming industrial LLMs of GPT-4o, GPT-5 and DeepSeek-V3 and several supervised-fine-tuning (SFT)-only and RL-only baselines.

QUASAR: 에이전트 강화 학습을 통한 도구-보조 대형 언어 모델 기반 양자 어셈블리 코드 생성

QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL

초록

Support