自动呈现:从零开始设计结构化视觉。
AutoPresent: Designing Structured Visuals from Scratch
January 1, 2025
作者: Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell
cs.AI
摘要
设计结构化的视觉元素,如演示幻灯片,对于沟通需求至关重要,需要具备内容创作和视觉规划技能。在这项工作中,我们解决了自动生成幻灯片的挑战,模型从自然语言(NL)指令生成幻灯片演示。我们首先介绍了SlidesBench基准测试,这是第一个幻灯片生成基准测试,包括来自10个领域的310个幻灯片组合的7k个训练和585个测试示例。SlidesBench支持评估,既可以基于参考度量与目标幻灯片的相似性,也可以无参考度量仅测量生成幻灯片的设计质量。我们使用各种模型对端到端图像生成和程序生成方法进行基准测试,发现编程方法生成的幻灯片质量更高,且具有用户可交互的格式。基于程序生成的成功,我们创建了AutoPresent,这是一个基于8B Llama的模型,训练于7k对指令和用于幻灯片生成的代码,取得了与封闭源模型GPT-4o可比的结果。我们进一步探讨了迭代设计优化,让模型自我完善其输出,发现这一过程提高了幻灯片的质量。我们希望我们的工作能为未来生成结构化视觉元素的研究奠定基础。
English
Designing structured visuals such as presentation slides is essential for
communicative needs, necessitating both content creation and visual planning
skills. In this work, we tackle the challenge of automated slide generation,
where models produce slide presentations from natural language (NL)
instructions. We first introduce the SlidesBench benchmark, the first benchmark
for slide generation with 7k training and 585 testing examples derived from 310
slide decks across 10 domains. SlidesBench supports evaluations that are
(i)reference-based to measure similarity to a target slide, and
(ii)reference-free to measure the design quality of generated slides alone. We
benchmark end-to-end image generation and program generation methods with a
variety of models, and find that programmatic methods produce higher-quality
slides in user-interactable formats. Built on the success of program
generation, we create AutoPresent, an 8B Llama-based model trained on 7k pairs
of instructions paired with code for slide generation, and achieve results
comparable to the closed-source model GPT-4o. We further explore iterative
design refinement where the model is tasked to self-refine its own output, and
we found that this process improves the slide's quality. We hope that our work
will provide a basis for future work on generating structured visuals.