ChatPaper.aiChatPaper

叠加梯度下降法:运用量子原理进行模型训练

Superpositional Gradient Descent: Harnessing Quantum Principles for Model Training

November 1, 2025
作者: Ahmet Erdem Pamuk, Emir Kaan Özdemir, Şuayp Talha Kocabay
cs.AI

摘要

大型语言模型(LLMs)日益采用AdamW等经典优化技术进行训练,以提升收敛性与泛化能力。然而,量子启发式方法增强经典训练的机制仍待深入探索。我们提出叠加梯度下降法(SGD),这是一种通过注入量子电路扰动将梯度更新与量子叠加相关联的新型优化器。我们建立了数学框架,并在PyTorch和Qiskit中实现了混合量子-经典电路。在合成序列分类和大规模LLM微调任务中,SGD相比AdamW收敛更快且最终损失更低。尽管结果令人鼓舞,可扩展性及硬件限制仍阻碍其广泛应用。本研究为量子计算与深度学习的交叉领域提供了新视角,揭示了利用量子原理调控和增强模型行为的可行路径。
English
Large language models (LLMs) are increasingly trained with classical optimization techniques like AdamW to improve convergence and generalization. However, the mechanisms by which quantum-inspired methods enhance classical training remain underexplored. We introduce Superpositional Gradient Descent (SGD), a novel optimizer linking gradient updates with quantum superposition by injecting quantum circuit perturbations. We present a mathematical framework and implement hybrid quantum-classical circuits in PyTorch and Qiskit. On synthetic sequence classification and large-scale LLM fine-tuning, SGD converges faster and yields lower final loss than AdamW. Despite promising results, scalability and hardware constraints limit adoption. Overall, this work provides new insights into the intersection of quantum computing and deep learning, suggesting practical pathways for leveraging quantum principles to control and enhance model behavior.
PDF112December 1, 2025