ThoughtTrace：理解用戶在真實世界LLM交互中的思維

摘要

对话式AI现已覆盖数十亿用户，然而现有数据集仅记录人们的表述内容，而非其内在思维。我们推出ThoughtTrace——首个将现实世界多轮人机对话与用户自我报告思维（包括发送提示的原因及对助手回复的反应）配对的大规模数据集。该数据集涵盖1,058名用户、2,155段对话、17,058轮交互及10,174条思维注释，数据采集自20种语言模型。分析表明，ThoughtTrace捕捉了长跨度、主题多样的交互过程，且思维记录在语义上与消息截然不同：前沿大型语言模型难以通过上下文推断这些思维，其内容多样并与对话阶段密切关联。我们进一步展示了思维对下游建模的价值：首先，思维作为推理时上下文可提升用户行为预测效果；其次，思维引导的改写为训练个性化助手提供细粒度对齐信号。综上，ThoughtTrace将用户思维确立为研究人机交互背后认知动态的新数据模态，为构建更贴合用户潜在目标、偏好与需求的助手奠定基础。

English

Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human--AI conversations with users' self-reported thoughts: their reasons for sending prompts and reactions to assistant responses. ThoughtTrace comprises 1,058 users, 2,155 conversations, 17,058 turns, and 10,174 thought annotations collected across 20 language models. Our analysis shows that ThoughtTrace captures long-horizon, topically diverse interactions, and that thoughts are semantically distinct from messages, difficult for frontier LLMs to infer from context, diverse in content, and tied to conversation stages. We further demonstrate the utility of thoughts for downstream modeling. First, thoughts improve user-behavior prediction as inference-time context. Second, thought-guided rewrites provide fine-grained alignment signals for training personalized assistants. Together, ThoughtTrace establishes user thoughts as a new data modality for studying the cognitive dynamics behind human--AI interaction and provides a foundation for building assistants that better understand and adapt to users' latent goals, preferences, and needs.