数据记者智能体:将数据转化为可验证的多模态故事
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories
June 9, 2026
作者: Kevin Qinghong Lin, Batu EI, Yuhong Shi, Pan Lu, Philip Torr, James Zou
cs.AI
摘要
数据讲述的故事塑造着社会;数据记者的职责是将原始信息转化为非专业人士能够信赖的报道。一篇高质量的新闻特写需要一个新闻编辑团队耗时数周:挖掘背景、运行统计分析、选择报道角度、设计可视化呈现。当前的人工智能代理能够很好地处理单个环节:数据科学代理可完成分析闭环,设计代理能合成精美的网站。但一个代理能否端到端地扮演数据记者的角色?我们提出数据记者代理(Data2Story),这是一个多智能体框架,将各专业角色编排成一个虚拟新闻编辑室。Data2Story 贡献了两项创新:(i) 主张基于证据:一位审查员将每个数字、角度和素材链接回数据、代码或外部参考文献。(ii) 文章多模态生成:Data2Story 并非默认生成纯文本和静态图表,而是推理读者希望看到的内容,然后部署多模态工具,例如为地理信息使用交互式地图、为音乐使用音频。我们在 18 篇文章上评估 Data2Story,每篇均配有最初发表的专家撰写文章,从四个维度进行衡量:(a) 人类与代理的报道角度覆盖度;(b) 53 名参与者在五个维度上的量表评估;(c) 将计算机使用代理作为评判者,作为模拟读者如何浏览交互式文章的节省成本的替代方案;(d) 可验证性,通过代码验证器重新执行数据语句,并核对主张与参考文献是否一致。Data2Story 能产出具有竞争力且证据可追溯的多媒体故事,尤其在透明性和可审计性方面表现出色。人类文章在编辑角度、创意设计和呈现方式上仍具优势。我们将 Data2Story 定位为记者的协作工具,助力实现更多基于证据、透明且可验证的报道。代码与演示可在 https://data2story.github.io 获取。
English
Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual steps well: data-science agents close the analysis loop, while design agents synthesize beautiful websites. But can an agent serve as a data journalist end to end? We introduce Data Journalist Agent (Data2Story), a multi-agent framework that orchestrates specialized roles into a single virtual newsroom. Data2Story contributes two innovations. (i) Claims are evidence-grounded: an Inspector links every number, angle, and asset back to data, code, or an external reference. (ii) Articles are multimodally generative: rather than defaulting to plain text and static charts, Data2Story reasons about what readers will want to see, then deploys multimodal tools, such as interactive maps for geography and audio for music. We evaluate Data2Story on 18 articles, each paired with the originally published expert piece, along four axes: (a) human-agent angle coverage; (b) rubric evaluation with 53 participants across five dimensions; (c) computer-use agents as judges, a cost-saving proxy for how readers navigate interactive articles; and (d) verifiability, where a coding verifier re-executes statements against the data and checks claims against references. Data2Story produces competitive, evidence-traceable multimedia stories, with particular strength in transparency and auditability. Human articles retain an edge in editorial angle, creative design, and presentation. We position Data2Story as a collaborator for journalists, enabling more evidence-based, transparent, and verifiable reporting. Code and demos are available at https://data2story.github.io.