WikiAutoGen: マルチモーダルなWikipediaスタイル記事生成に向けて

要旨

知識の発見と収集は、伝統的に高品質なアウトプットを確保するために多大な人的労力を要する知的集約型タスクである。最近の研究では、インターネットから情報を検索し統合することで、Wikipediaスタイルの記事生成を自動化するためのマルチエージェントフレームワークが探求されている。しかし、これらの手法は主にテキストのみの生成に焦点を当てており、情報量とエンゲージメントを高める上でのマルチモーダルコンテンツの重要性を見落としている。本研究では、自動化されたマルチモーダルWikipediaスタイル記事生成のための新規システム、WikiAutoGenを紹介する。従来のアプローチとは異なり、WikiAutoGenはテキストと共に関連する画像を検索し統合することで、生成されるコンテンツの深みと視覚的魅力を向上させる。さらに、事実の正確性と包括性を向上させるために、多視点自己反映メカニズムを提案する。このメカニズムは、検索されたコンテンツを多様な視点から批判的に評価し、信頼性、広がり、一貫性などを高める。加えて、より挑戦的なトピックにおけるマルチモーダル知識生成を評価するために、Wikipedia記事とそのテキストおよび画像ベースの表現をペアにしたベンチマーク、WikiSeekを導入する。実験結果は、WikiAutoGenが我々のWikiSeekベンチマークにおいて、従来の手法を8%-29%上回り、より正確で一貫性があり、視覚的に豊かなWikipediaスタイルの記事を生成することを示している。生成された例の一部はhttps://wikiautogen.github.io/で公開している。

English

Knowledge discovery and collection are intelligence-intensive tasks that traditionally require significant human effort to ensure high-quality outputs. Recent research has explored multi-agent frameworks for automating Wikipedia-style article generation by retrieving and synthesizing information from the internet. However, these methods primarily focus on text-only generation, overlooking the importance of multimodal content in enhancing informativeness and engagement. In this work, we introduce WikiAutoGen, a novel system for automated multimodal Wikipedia-style article generation. Unlike prior approaches, WikiAutoGen retrieves and integrates relevant images alongside text, enriching both the depth and visual appeal of generated content. To further improve factual accuracy and comprehensiveness, we propose a multi-perspective self-reflection mechanism, which critically assesses retrieved content from diverse viewpoints to enhance reliability, breadth, and coherence, etc. Additionally, we introduce WikiSeek, a benchmark comprising Wikipedia articles with topics paired with both textual and image-based representations, designed to evaluate multimodal knowledge generation on more challenging topics. Experimental results show that WikiAutoGen outperforms previous methods by 8%-29% on our WikiSeek benchmark, producing more accurate, coherent, and visually enriched Wikipedia-style articles. We show some of our generated examples in https://wikiautogen.github.io/ .

WikiAutoGen: マルチモーダルなWikipediaスタイル記事生成に向けて

WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation

要旨

Support