數據記者智能體:將數據轉化為可驗證的多模態故事
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories
June 9, 2026
作者: Kevin Qinghong Lin, Batu EI, Yuhong Shi, Pan Lu, Philip Torr, James Zou
cs.AI
摘要
資料講述故事,進而塑造社會;資料記者的職責,就是將原始資訊轉化為非專業人士也能信賴的故事。一則高品質的新聞專題,需要新聞編輯室團隊花費數週時間:尋找背景脈絡、執行統計分析、選定切入角度、設計視覺呈現。近期的智能代理程式能妥善處理個別步驟:資料科學代理程式能完成分析循環,而設計代理程式則能綜整出美觀的網站。但一個代理程式能否從頭到尾擔任資料記者的角色?我們提出「資料記者代理」(Data2Story),這是一個多代理框架,能將各種專業角色統合於單一虛擬新聞編輯室中。Data2Story 帶來了兩項創新:(i) 主張有證據佐證:審查代理會將每個數字、角度與素材,連結回資料、程式碼或外部參考文獻。(ii) 文章採多模態生成:Data2Story 不會預設使用純文字與靜態圖表,而是推論讀者會想看什麼,再部署多模態工具,例如地理相關的互動地圖,以及音樂相關的音訊。我們從四個面向評估 Data2Story,共計 18 篇文章,每篇皆與原始發表的專家文章配對:(a) 人類與代理在報導角度上的涵蓋範圍;(b) 透過 53 位參與者、涵蓋五個維度的評分量表評估;(c) 以電腦使用代理作為評審,作為衡量讀者如何瀏覽互動式文章的節省成本替代方案;(d) 可驗證性,由程式碼驗證器針對資料重新執行陳述,並比對主張與參考文獻。Data2Story 能產出具有競爭力、可追溯證據的多媒體報導,尤其在透明度與可稽核性方面表現突出。人類文章則在編輯角度、創意設計與呈現上仍保有優勢。我們將 Data2Story 定位為記者的協作夥伴,促成更具證據基礎、更透明且更可驗證的報導。程式碼與示範請見 https://data2story.github.io。
English
Data tells stories that shape society; the data journalist's job is to turn raw information into stories non-experts can trust. A high-quality news feature takes a newsroom team weeks: hunting for context, running statistics, choosing an angle, and designing visuals. Recent agents handle individual steps well: data-science agents close the analysis loop, while design agents synthesize beautiful websites. But can an agent serve as a data journalist end to end? We introduce Data Journalist Agent (Data2Story), a multi-agent framework that orchestrates specialized roles into a single virtual newsroom. Data2Story contributes two innovations. (i) Claims are evidence-grounded: an Inspector links every number, angle, and asset back to data, code, or an external reference. (ii) Articles are multimodally generative: rather than defaulting to plain text and static charts, Data2Story reasons about what readers will want to see, then deploys multimodal tools, such as interactive maps for geography and audio for music. We evaluate Data2Story on 18 articles, each paired with the originally published expert piece, along four axes: (a) human-agent angle coverage; (b) rubric evaluation with 53 participants across five dimensions; (c) computer-use agents as judges, a cost-saving proxy for how readers navigate interactive articles; and (d) verifiability, where a coding verifier re-executes statements against the data and checks claims against references. Data2Story produces competitive, evidence-traceable multimedia stories, with particular strength in transparency and auditability. Human articles retain an edge in editorial angle, creative design, and presentation. We position Data2Story as a collaborator for journalists, enabling more evidence-based, transparent, and verifiable reporting. Code and demos are available at https://data2story.github.io.