ChatPaper.aiChatPaper

SurveyForge:論自動化問卷撰寫中的大綱啟發式、記憶驅動生成與多維度評估

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

March 6, 2025
作者: Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Bo Zhang, Lei Bai
cs.AI

摘要

綜述論文在科學研究中扮演著至關重要的角色,尤其是在研究出版物快速增長的背景下。近年來,研究人員開始利用大型語言模型(LLMs)自動生成綜述,以提高效率。然而,LLM生成的綜述與人類撰寫的綜述之間仍存在顯著質量差距,特別是在大綱質量和引用準確性方面。為縮小這些差距,我們推出了SurveyForge,該工具首先通過分析人類撰寫綜述的邏輯結構並參考檢索到的領域相關文獻來生成大綱。隨後,利用學者導航代理從記憶中檢索到的高質量論文,SurveyForge能夠自動生成並精煉文章內容。此外,為實現全面評估,我們構建了SurveyBench,其中包含100篇人類撰寫的綜述論文用於勝率比較,並從參考文獻、大綱和內容質量三個維度評估AI生成的綜述論文。實驗結果表明,SurveyForge能夠超越AutoSurvey等先前工作。
English
Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.

Summary

AI-Generated Summary

PDF172March 11, 2025