ThoughtTrace: 実世界のLLMインタラクションにおけるユーザ思考の理解

要旨

対話型AIは現在、数十億人のユーザーに届いていますが、既存のデータセットはユーザーが何を言ったかだけを捉え、何を考えているかは捉えていません。我々はThoughtTraceを導入します。これは、実世界のマルチターンな人間-AI対話と、ユーザーが自己報告した思考（プロンプトを送信した理由やアシスタントの応答に対する反応）をペアリングした初の大規模データセットです。ThoughtTraceは、1,058人のユーザー、2,155件の対話、17,058ターン、および20の言語モデルにわたって収集された10,174件の思考アノテーションで構成されています。分析により、ThoughtTraceは長期的でトピック的に多様な相互作用を捉え、思考がメッセージと意味的に異なり、最先端LLMが文脈から推論するのが困難であり、内容が多様で、対話の段階に関連付けられていることが示されます。さらに、下流モデリングにおける思考の有用性を実証します。第一に、思考は推論時コンテキストとしてユーザー行動予測を改善します。第二に、思考に導かれたリライトは、パーソナライズされたアシスタントを訓練するための細粒度のアライメント信号を提供します。以上により、ThoughtTraceは人間-AI相互作用の背後にある認知ダイナミクスを研究するための新しいデータモダリティとしてユーザーの思考を確立し、ユーザーの潜在的な目標、好み、ニーズをより理解し適応するアシスタントを構築するための基盤を提供します。

English

Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human--AI conversations with users' self-reported thoughts: their reasons for sending prompts and reactions to assistant responses. ThoughtTrace comprises 1,058 users, 2,155 conversations, 17,058 turns, and 10,174 thought annotations collected across 20 language models. Our analysis shows that ThoughtTrace captures long-horizon, topically diverse interactions, and that thoughts are semantically distinct from messages, difficult for frontier LLMs to infer from context, diverse in content, and tied to conversation stages. We further demonstrate the utility of thoughts for downstream modeling. First, thoughts improve user-behavior prediction as inference-time context. Second, thought-guided rewrites provide fine-grained alignment signals for training personalized assistants. Together, ThoughtTrace establishes user thoughts as a new data modality for studying the cognitive dynamics behind human--AI interaction and provides a foundation for building assistants that better understand and adapt to users' latent goals, preferences, and needs.