矢量棱镜:通过分层语义结构实现矢量图形动画
Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure
December 16, 2025
作者: Jooyeol Yun, Jaegul Choo
cs.AI
摘要
可缩放矢量图形(SVG)在现代网页设计中占据核心地位,随着网络环境日益动态化,对其动画化的需求持续增长。然而尽管在代码生成与运动规划领域取得了进展,让视觉语言模型(VLM)自动实现矢量图形动画仍面临挑战。由于视觉上连贯的部件常被分割为底层图形单元,无法提供元素运动关联性指引,VLM在处理SVG时频繁出错。本文提出一种框架,通过恢复SVG动画所需的语义结构,揭示当前VLM系统所忽视的关键层级。该框架基于多重弱部件预测的统计聚合,使系统能从嘈杂预测中稳定推断语义信息。通过将SVG重组为语义群组,我们的方法使VLM能生成连贯性显著提升的动画效果。实验结果表明,该方法相较现有技术取得实质性突破,证明语义重建是实现稳健SVG动画的关键步骤,并为VLM与矢量图形之间建立更可解释的交互机制提供支撑。
English
Scalable Vector Graphics (SVG) are central to modern web design, and the demand to animate them continues to grow as web environments become increasingly dynamic. Yet automating the animation of vector graphics remains challenging for vision-language models (VLMs) despite recent progress in code generation and motion planning. VLMs routinely mis-handle SVGs, since visually coherent parts are often fragmented into low-level shapes that offer little guidance of which elements should move together. In this paper, we introduce a framework that recovers the semantic structure required for reliable SVG animation and reveals the missing layer that current VLM systems overlook. This is achieved through a statistical aggregation of multiple weak part predictions, allowing the system to stably infer semantics from noisy predictions. By reorganizing SVGs into semantic groups, our approach enables VLMs to produce animations with far greater coherence. Our experiments demonstrate substantial gains over existing approaches, suggesting that semantic recovery is the key step that unlocks robust SVG animation and supports more interpretable interactions between VLMs and vector graphics.