智能体时代的因果发现

摘要

近期，将大语言模型与因果发现相结合的尝试，要求模型推断变量对之间的方向、提出图结构，或将语言模型的输出作为先验和约束引入。这些方法有望加速分析，但也模糊了因果证据究竟是源于数据与假设，还是源于文本关联、提示伪影或虚构机制。我们对智能体在因果发现中的角色提出了不同看法：智能体应当检查数据、检索上下文、解释方法假设并阐明图输出，但不应当提供边、方向、先验、约束或因果结论。我们提出原则：智能体辅助工作流程，而因果主张仍需基于数据、明确假设、形式化算法、诊断结果以及用户或领域专家的决策。我们基于这一原则构建了 causal-learn+ 在线平台，该平台围绕 causal-learn 算法生态系统，协调数据分析、预处理、方法推荐、专家知识整合、形式化发现与解释。基于大五人格数据的案例研究展示了在因果发现中，如何实现智能体辅助的流程，而避免将语言模型的不可靠性转化为因果证据。平台网址为 causallearn.com。

English

Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constraints. These approaches promise faster analysis, but they also obscure whether a causal evidence is supported by data and assumptions or by textual associations, prompt artifacts and hallucinated mechanisms. We argue for a different role for agents in causal discovery. Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. We propose the principle that agents assist the workflow, while causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions. We instantiate this principle in causal-learn+, an online platform that coordinates data analysis, preprocessing, method recommendation, expert-knowledge incorporation, formal discovery and interpretation around the algorithmic ecosystem of causal-learn. A case study on Big Five personality data illustrates agent-assisted pipeline of causal discovery without turning language-model unreliability into causal evidence. The platform is available at causallearn.com.