エージェント時代の因果発見

要旨

近年、大規模言語モデル（LLM）と因果発見を組み合わせようとする試みでは、モデルにペアごとの方向性を推論させたり、グラフ構造を提案させたり、言語モデルの出力を事前分布や制約として注入したりしている。これらのアプローチはより高速な分析を約束する一方で、因果的証拠がデータと仮定に基づくものなのか、それともテキスト上の関連性、プロンプトアーティファクト、幻覚的なメカニズムに支えられているのかを曖昧にしてしまう。我々は、因果発見におけるエージェントの役割として異なるものを主張する。エージェントはデータを検査し、文脈を取得し、手法の仮定を説明し、グラフ出力を明確化すべきであるが、エッジや方向、事前分布、制約、因果的結論を提供すべきではない。我々は、エージェントがワークフローを支援する一方で、因果的主張はデータ、明示的な仮定、形式的アルゴリズム、診断、そしてユーザーやドメイン専門家の判断に基づき続けるべきであるという原則を提案する。この原則を、causal-learn+というオンラインプラットフォームで具体化する。このプラットフォームは、causal-learnのアルゴリズムエコシステムを中心に、データ分析、前処理、手法推薦、専門知識の組み込み、形式的発見、解釈を統合する。ビッグファイブ性格データを用いたケーススタディは、言語モデルの信頼性の低さを因果的証拠に転換することなく、エージェント支援による因果発見のパイプラインを実証する。本プラットフォームはcausallearn.comで利用可能である。

English

Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constraints. These approaches promise faster analysis, but they also obscure whether a causal evidence is supported by data and assumptions or by textual associations, prompt artifacts and hallucinated mechanisms. We argue for a different role for agents in causal discovery. Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. We propose the principle that agents assist the workflow, while causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions. We instantiate this principle in causal-learn+, an online platform that coordinates data analysis, preprocessing, method recommendation, expert-knowledge incorporation, formal discovery and interpretation around the algorithmic ecosystem of causal-learn. A case study on Big Five personality data illustrates agent-assisted pipeline of causal discovery without turning language-model unreliability into causal evidence. The platform is available at causallearn.com.