AsyncVoice 代理:即時解釋大型語言模型的規劃與推理
AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning
October 17, 2025
作者: Yueqian Lin, Zhengmian Hu, Jayakumar Subramanian, Qinsi Wang, Nikos Vlassis, Hai "Helen" Li, Yiran Chen
cs.AI
摘要
在複雜推理任務上實現有效的人機協作,要求使用者不僅能接收模型的輸出,更要能理解並與模型的推理過程互動。然而,像思維鏈(Chain-of-Thought, CoT)這類方法產生的單一文本阻礙了這一點,因為現有的介面缺乏即時語音化功能以及強大的使用者中斷機制。我們提出了AsyncVoice Agent,這是一個採用異步架構的系統,它將流式大型語言模型(LLM)後端與對話式語音前端解耦。這種設計使得敘述與推理能夠並行運行,讓使用者能夠隨時中斷、查詢並引導模型的推理過程。客觀基準測試表明,與單一基線相比,這種方法將互動延遲降低了超過600倍,同時確保了高保真度和競爭性的任務準確率。通過實現與模型思維過程的雙向對話,AsyncVoice Agent為構建更高效、可引導且可信賴的高風險任務人機系統提供了一種新範式。
English
Effective human-AI collaboration on complex reasoning tasks requires that
users understand and interact with the model's process, not just receive an
output. However, the monolithic text from methods like Chain-of-Thought (CoT)
prevents this, as current interfaces lack real-time verbalization and robust
user barge-in. We present AsyncVoice Agent, a system whose asynchronous
architecture decouples a streaming LLM backend from a conversational voice
frontend. This design allows narration and inference to run in parallel,
empowering users to interrupt, query, and steer the model's reasoning process
at any time. Objective benchmarks show this approach reduces interaction
latency by more than 600x compared to monolithic baselines while ensuring high
fidelity and competitive task accuracy. By enabling a two-way dialogue with a
model's thought process, AsyncVoice Agent offers a new paradigm for building
more effective, steerable, and trustworthy human-AI systems for high-stakes
tasks.