ChatPaper.aiChatPaper

心智驱动对话:双脑协同实现口语模型的实时推理

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

October 10, 2025
作者: Donghang Wu, Haoyang Zhang, Jun Chen, Xiangyu, Zhang, Hexin Liu, Eng Siong Chng, Fei Tian, Xuerui Yang, Xiangyu Zhang, Daxin Jiang, Gang Yu
cs.AI

摘要

实时口语语言模型(SLMs)在利用思维链(CoT)推理时面临挑战,主要原因在于按顺序生成整个思维过程会带来难以接受的延迟。让SLMs像人类一样边思考边说话,正日益受到关注。我们首次提出了“思维节奏说话”(Mind-Paced Speaking, MPS),这是一个受大脑启发的框架,能够实现高保真度的实时推理。类似于人类利用不同脑区进行思考和回应,我们提出了一种新颖的双脑方法,采用“构思脑”进行高层次推理,以节奏化并指导独立的“表达脑”流畅生成语音。这种分工消除了模式切换,保持了推理过程的完整性。实验表明,MPS显著优于现有的边想边说方法,在推理性能上可与预先计算完整CoT再说话的模型相媲美,同时大幅降低了延迟。在零延迟配置下,该方法在数学推理任务Spoken-MQA上达到了92.8%的准确率,在语音对话任务URO-Bench上获得了82.5分。我们的工作有效弥合了高质量推理与实时交互之间的鸿沟。
English
Real-time Spoken Language Models (SLMs) struggle to leverage Chain-of-Thought (CoT) reasoning due to the prohibitive latency of generating the entire thought process sequentially. Enabling SLMs to think while speaking, similar to humans, is attracting increasing attention. We present, for the first time, Mind-Paced Speaking (MPS), a brain-inspired framework that enables high-fidelity, real-time reasoning. Similar to how humans utilize distinct brain regions for thinking and responding, we propose a novel dual-brain approach, employing a "Formulation Brain" for high-level reasoning to pace and guide a separate "Articulation Brain" for fluent speech generation. This division of labor eliminates mode-switching, preserving the integrity of the reasoning process. Experiments show that MPS significantly outperforms existing think-while-speaking methods and achieves reasoning performance comparable to models that pre-compute the full CoT before speaking, while drastically reducing latency. Under a zero-latency configuration, the proposed method achieves an accuracy of 92.8% on the mathematical reasoning task Spoken-MQA and attains a score of 82.5 on the speech conversation task URO-Bench. Our work effectively bridges the gap between high-quality reasoning and real-time interaction.
PDF42October 13, 2025