WikiAutoGen：迈向多模态维基百科式文章生成

摘要

知识发现与收集是智力密集型任务，传统上需要大量人力投入以确保高质量输出。近期研究探索了多智能体框架，通过从互联网检索并整合信息来自动生成维基百科风格的文章。然而，这些方法主要集中于纯文本生成，忽视了多模态内容在提升信息丰富度和吸引力方面的重要性。本研究中，我们引入了WikiAutoGen，一个用于自动化生成多模态维基百科风格文章的新颖系统。与以往方法不同，WikiAutoGen不仅检索并整合相关文本，还融合了相关图像，从而丰富了生成内容的深度与视觉吸引力。为进一步提升事实准确性与全面性，我们提出了一种多视角自我反思机制，该机制从多个角度批判性评估检索到的内容，以增强其可靠性、广度及连贯性等。此外，我们推出了WikiSeek基准，该基准包含配以文本和图像表示的维基百科文章主题，旨在评估更具挑战性主题下的多模态知识生成能力。实验结果表明，WikiAutoGen在我们的WikiSeek基准上较之前方法提升了8%-29%，生成了更为准确、连贯且视觉上更为丰富的维基百科风格文章。部分生成示例展示于https://wikiautogen.github.io/。

English

Knowledge discovery and collection are intelligence-intensive tasks that traditionally require significant human effort to ensure high-quality outputs. Recent research has explored multi-agent frameworks for automating Wikipedia-style article generation by retrieving and synthesizing information from the internet. However, these methods primarily focus on text-only generation, overlooking the importance of multimodal content in enhancing informativeness and engagement. In this work, we introduce WikiAutoGen, a novel system for automated multimodal Wikipedia-style article generation. Unlike prior approaches, WikiAutoGen retrieves and integrates relevant images alongside text, enriching both the depth and visual appeal of generated content. To further improve factual accuracy and comprehensiveness, we propose a multi-perspective self-reflection mechanism, which critically assesses retrieved content from diverse viewpoints to enhance reliability, breadth, and coherence, etc. Additionally, we introduce WikiSeek, a benchmark comprising Wikipedia articles with topics paired with both textual and image-based representations, designed to evaluate multimodal knowledge generation on more challenging topics. Experimental results show that WikiAutoGen outperforms previous methods by 8%-29% on our WikiSeek benchmark, producing more accurate, coherent, and visually enriched Wikipedia-style articles. We show some of our generated examples in https://wikiautogen.github.io/ .

WikiAutoGen：迈向多模态维基百科式文章生成

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation

摘要

Support