Causale Ontdekking in het Tijdperk van Agenten

Samenvatting

Recente pogingen om grote taalmodellen (LLMs) te combineren met causale ontdekking vragen modellen om paarsgewijze richtingen af te leiden, grafiekstructuren voor te stellen, of taalmodeluitvoer als prior knowledge en constraints in te voeren. Deze benaderingen beloven snellere analyses, maar verdoezelen ook of een causaal bewijs wordt ondersteund door data en aannames of door tekstuele associaties, prompt-artefacten en gehallucineerde mechanismen. Wij pleiten voor een andere rol van agenten in causale ontdekking. Agenten moeten data inspecteren, context ophalen, methode-aannames uitleggen en grafiekuitvoer verduidelijken, maar zij moeten geen edges, oriëntaties, prior knowledge, constraints of causale conclusies leveren. Wij stellen het principe voor dat agenten de workflow ondersteunen, terwijl causale claims geworteld blijven in data, expliciete aannames, formele algoritmen, diagnostiek en beslissingen van gebruikers of domeinexperts. We instantieren dit principe in causal-learn+, een online platform dat data-analyse, preprocessing, methode-aanbeveling, integratie van expertkennis, formele ontdekking en interpretatie coördineert rond het algoritmische ecosysteem van causal-learn. Een casestudy met Big Five-persoonlijkheidsdata illustreert een door agenten ondersteunde pipeline van causale ontdekking zonder de onbetrouwbaarheid van taalmodellen om te zetten in causaal bewijs. Het platform is beschikbaar op causallearn.com.

English

Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constraints. These approaches promise faster analysis, but they also obscure whether a causal evidence is supported by data and assumptions or by textual associations, prompt artifacts and hallucinated mechanisms. We argue for a different role for agents in causal discovery. Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. We propose the principle that agents assist the workflow, while causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions. We instantiate this principle in causal-learn+, an online platform that coordinates data analysis, preprocessing, method recommendation, expert-knowledge incorporation, formal discovery and interpretation around the algorithmic ecosystem of causal-learn. A case study on Big Five personality data illustrates agent-assisted pipeline of causal discovery without turning language-model unreliability into causal evidence. The platform is available at causallearn.com.