群体思维:多个并发推理代理在令牌级别粒度上进行协作
Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity
May 16, 2025
作者: Chan-Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-shan Shiu
cs.AI
摘要
近期,大型语言模型(LLMs)的进展展现了通过自我生成的思维链进行推理的强大能力。多个推理代理可以协作,将联合推理质量提升至超越个体成果的水平。然而,这类代理通常以轮替方式交互,以增加延迟为代价换取质量的提升。本文提出“群体思维”(Group Think)——一个作为多个并发推理代理或思考者运作的单一LLM。通过共享彼此部分生成进度的可见性,群体思维引入了一种新的并发推理范式,其中多个推理轨迹在令牌级别上动态相互适应。例如,一个推理线程在检测到另一线程更适合继续时,可能会在句子中间调整其生成。这种细粒度的、令牌级别的协作使群体思维能够减少冗余推理,在显著降低延迟的同时提高质量。此外,其并发特性允许高效利用闲置计算资源,使其特别适合边缘推理场景,在那里,极小的批量大小往往导致本地GPU利用率不足。我们提供了一种简单且可推广的修改方法,使任何现有LLM都能在本地GPU上执行群体思维。我们还提出了一种评估策略来基准测试推理延迟,并实证展示了使用未针对群体思维显式训练的开源LLM实现的延迟改进。我们希望这项工作为未来LLM展现更复杂、更高效的协作行为,以实现更高质量的生成铺平道路。
English
Recent advances in large language models (LLMs) have demonstrated the power
of reasoning through self-generated chains of thought. Multiple reasoning
agents can collaborate to raise joint reasoning quality above individual
outcomes. However, such agents typically interact in a turn-based manner,
trading increased latency for improved quality. In this paper, we propose Group
Think--a single LLM that acts as multiple concurrent reasoning agents, or
thinkers. With shared visibility into each other's partial generation progress,
Group Think introduces a new concurrent-reasoning paradigm in which multiple
reasoning trajectories adapt dynamically to one another at the token level. For
example, a reasoning thread may shift its generation mid-sentence upon
detecting that another thread is better positioned to continue. This
fine-grained, token-level collaboration enables Group Think to reduce redundant
reasoning and improve quality while achieving significantly lower latency.
Moreover, its concurrent nature allows for efficient utilization of idle
computational resources, making it especially suitable for edge inference,
where very small batch size often underutilizes local~GPUs. We give a simple
and generalizable modification that enables any existing LLM to perform Group
Think on a local GPU. We also present an evaluation strategy to benchmark
reasoning latency and empirically demonstrate latency improvements using
open-source LLMs that were not explicitly trained for Group Think. We hope this
work paves the way for future LLMs to exhibit more sophisticated and more
efficient collaborative behavior for higher quality generation.Summary
AI-Generated Summary