QUASAR：基于工具增强型大语言模型的量子汇编代码生成，通过代理式强化学习实现

摘要

设计和优化特定任务的量子电路对于发挥量子计算的优势至关重要。近期，基于大语言模型（LLM）的量子电路生成方法作为一种有前景的自动化解决方案崭露头角。然而，根本性挑战仍未得到解决：（i）参数化量子门需要精确的数值以实现最佳性能，这些数值还取决于多个因素，包括量子门的数量、其参数以及电路的布局/深度。（ii）由于缺乏量子领域特定知识，LLM 生成的量子电路往往质量低下或存在错误。我们提出了 QUASAR，一个基于工具增强型 LLM 的量子电路生成与优化的强化学习（RL）框架。为了使 LLM 与量子特定知识对齐并提升生成的量子电路质量，QUASAR 设计了（i）一种利用外部量子模拟器进行量子电路验证的方法，以及（ii）在 RL 训练中采用复杂的分层奖励机制。大量评估表明，生成的量子电路在语法和语义性能上均有所提升。当应用于一个 40 亿参数的 LLM 时，QUASAR 在 Pass@1 中达到了 99.31% 的有效性，在 Pass@10 中实现了 100% 的有效性，超越了 GPT-4o、GPT-5 和 DeepSeek-V3 等工业级 LLM，以及多个仅采用监督微调（SFT）和仅 RL 的基线模型。

English

Designing and optimizing task-specific quantum circuits are crucial to leverage the advantage of quantum computing. Recent large language model (LLM)-based quantum circuit generation has emerged as a promising automatic solution. However, the fundamental challenges remain unaddressed: (i) parameterized quantum gates require precise numerical values for optimal performance, which also depend on multiple aspects, including the number of quantum gates, their parameters, and the layout/depth of the circuits. (ii) LLMs often generate low-quality or incorrect quantum circuits due to the lack of quantum domain-specific knowledge. We propose QUASAR, an agentic reinforcement learning (RL) framework for quantum circuits generation and optimization based on tool-augmented LLMs. To align the LLM with quantum-specific knowledge and improve the generated quantum circuits, QUASAR designs (i) a quantum circuit verification approach with external quantum simulators and (ii) a sophisticated hierarchical reward mechanism in RL training. Extensive evaluation shows improvements in both syntax and semantic performance of the generated quantum circuits. When augmenting a 4B LLM, QUASAR has achieved the validity of 99.31% in Pass@1 and 100% in Pass@10, outperforming industrial LLMs of GPT-4o, GPT-5 and DeepSeek-V3 and several supervised-fine-tuning (SFT)-only and RL-only baselines.

QUASAR：基于工具增强型大语言模型的量子汇编代码生成，通过代理式强化学习实现

QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL

摘要

Support