QUASAR:基於工具增強型大語言模型與代理強化學習的量子彙編代碼生成
QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL
October 1, 2025
作者: Cong Yu, Valter Uotila, Shilong Deng, Qingyuan Wu, Tuo Shi, Songlin Jiang, Lei You, Bo Zhao
cs.AI
摘要
设计与优化特定任务的量子电路,对于发挥量子计算的优势至关重要。近期,基于大型语言模型(LLM)的量子电路生成技术,作为一种自动化的解决方案崭露头角。然而,根本性挑战仍未得到解决:(一)参数化量子门需要精确的数值以实现最佳性能,这些数值还取决于多个因素,包括量子门的数量、其参数以及电路的布局/深度。(二)由于缺乏量子领域特定知识,LLM生成的量子电路往往质量低下或错误百出。我们提出了QUASAR,一个基于工具增强型LLM的量子电路生成与优化的代理强化学习(RL)框架。为了使LLM与量子特定知识对齐并提升生成的量子电路质量,QUASAR设计了(一)一种利用外部量子模拟器进行量子电路验证的方法,以及(二)在RL训练中采用了一种复杂的层次化奖励机制。广泛的评估显示,生成的量子电路在语法和语义性能上均有所提升。当增强一个4B规模的LLM时,QUASAR在Pass@1中达到了99.31%的有效性,在Pass@10中实现了100%的有效性,超越了工业级LLM如GPT-4o、GPT-5和DeepSeek-V3,以及多个仅采用监督微调(SFT)或仅依赖RL的基线模型。
English
Designing and optimizing task-specific quantum circuits are crucial to
leverage the advantage of quantum computing. Recent large language model
(LLM)-based quantum circuit generation has emerged as a promising automatic
solution. However, the fundamental challenges remain unaddressed: (i)
parameterized quantum gates require precise numerical values for optimal
performance, which also depend on multiple aspects, including the number of
quantum gates, their parameters, and the layout/depth of the circuits. (ii)
LLMs often generate low-quality or incorrect quantum circuits due to the lack
of quantum domain-specific knowledge. We propose QUASAR, an agentic
reinforcement learning (RL) framework for quantum circuits generation and
optimization based on tool-augmented LLMs. To align the LLM with
quantum-specific knowledge and improve the generated quantum circuits, QUASAR
designs (i) a quantum circuit verification approach with external quantum
simulators and (ii) a sophisticated hierarchical reward mechanism in RL
training. Extensive evaluation shows improvements in both syntax and semantic
performance of the generated quantum circuits. When augmenting a 4B LLM, QUASAR
has achieved the validity of 99.31% in Pass@1 and 100% in Pass@10,
outperforming industrial LLMs of GPT-4o, GPT-5 and DeepSeek-V3 and several
supervised-fine-tuning (SFT)-only and RL-only baselines.