ThoughtTrace：解读用户在实际大语言模型交互中的思想轨迹

摘要

对话式AI现已覆盖数十亿用户，然而现有数据集仅记录人们说了什么，而非他们内心所想。我们提出ThoughtTrace——首个将真实世界多轮人机对话与用户自我报告的思维（包括用户发送提示的原因及对助手回复的反应）配对的大规模数据集。该数据集涵盖1,058名用户、2,155次对话、17,058轮交互及10,174条思维注释，涉及20种语言模型。分析表明，ThoughtTrace捕获了长程、主题多样化的交互过程，且用户的思维与消息在语义上存在显著差异：前沿大语言模型难以从上下文中推断这些思维，其内容多元，并与对话阶段紧密关联。我们进一步展示了思维在下游建模中的价值：一方面，思维作为推理时的上下文可提升用户行为预测的性能；另一方面，思维引导的重写为训练个性化助手提供了细粒度的对齐信号。综上，ThoughtTrace将用户思维确立为一种新的数据模态，用于研究人机交互背后的认知动态，并为构建能更好理解并适应用户潜在目标、偏好与需求的助手奠定了基础。

English

Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human--AI conversations with users' self-reported thoughts: their reasons for sending prompts and reactions to assistant responses. ThoughtTrace comprises 1,058 users, 2,155 conversations, 17,058 turns, and 10,174 thought annotations collected across 20 language models. Our analysis shows that ThoughtTrace captures long-horizon, topically diverse interactions, and that thoughts are semantically distinct from messages, difficult for frontier LLMs to infer from context, diverse in content, and tied to conversation stages. We further demonstrate the utility of thoughts for downstream modeling. First, thoughts improve user-behavior prediction as inference-time context. Second, thought-guided rewrites provide fine-grained alignment signals for training personalized assistants. Together, ThoughtTrace establishes user thoughts as a new data modality for studying the cognitive dynamics behind human--AI interaction and provides a foundation for building assistants that better understand and adapt to users' latent goals, preferences, and needs.