ChatPaper.aiChatPaper

向量稜鏡:透過語意結構分層實現向量圖形動畫化

Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure

December 16, 2025
作者: Jooyeol Yun, Jaegul Choo
cs.AI

摘要

可縮放向量圖形(SVG)在現代網頁設計中佔據核心地位,隨著網路環境日益動態化,對其動畫化的需求持續增長。然而儘管在程式碼生成與運動規劃領域已有進展,讓視覺語言模型(VLM)自動生成向量圖形動畫仍面臨挑戰。VLM經常錯誤處理SVG檔案,因為視覺上連貫的元件往往被分割成低階圖形,無法提供哪些元素應共同移動的指引。本文提出一個框架,能重建實現可靠SVG動畫所需的語義結構,並揭示當前VLM系統忽略的關鍵層次。該框架透過統計聚合多個弱部件預測結果,使系統能從雜訊預測中穩定推斷語義。通過將SVG重組為語義群組,我們的方法能讓VLM生成更具連貫性的動畫。實驗結果顯示,相較現有方法實現顯著提升,表明語義重建是實現穩健SVG動畫的關鍵步驟,同時支持VLM與向量圖形間更可解釋的互動。
English
Scalable Vector Graphics (SVG) are central to modern web design, and the demand to animate them continues to grow as web environments become increasingly dynamic. Yet automating the animation of vector graphics remains challenging for vision-language models (VLMs) despite recent progress in code generation and motion planning. VLMs routinely mis-handle SVGs, since visually coherent parts are often fragmented into low-level shapes that offer little guidance of which elements should move together. In this paper, we introduce a framework that recovers the semantic structure required for reliable SVG animation and reveals the missing layer that current VLM systems overlook. This is achieved through a statistical aggregation of multiple weak part predictions, allowing the system to stably infer semantics from noisy predictions. By reorganizing SVGs into semantic groups, our approach enables VLMs to produce animations with far greater coherence. Our experiments demonstrate substantial gains over existing approaches, suggesting that semantic recovery is the key step that unlocks robust SVG animation and supports more interpretable interactions between VLMs and vector graphics.
PDF262December 18, 2025