在設備上的 Sora:為行動裝置啟用基於擴散的文本轉視頻生成
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices
February 5, 2025
作者: Bosung Kim, Kyuhwan Lee, Isu Jeong, Jungmin Cheon, Yeojin Lee, Seulki Lee
cs.AI
摘要
我們提出了On-device Sora,這是一個首創性的解決方案,用於在智能手機級設備上高效運行基於擴散的設備端文本到視頻生成。建立在Open-Sora的基礎上,On-device Sora應用了三種新技術來應對計算和內存受限的移動設備上基於擴散的文本到視頻生成所面臨的挑戰。首先,線性比例跳躍(LPL)通過高效的跳躍式方法減少了視頻擴散中所需的過多去噪步驟。其次,時間維度標記合併(TDTM)通過沿著時間維度合併連續的標記,減少了注意力層中密集的標記處理計算。第三,具有動態加載的同時推理(CI-DL)動態將大型模型劃分為較小的塊並將其加載到內存中進行同時模型推理,有效應對了設備內存受限的挑戰。我們在iPhone 15 Pro上實現了On-device Sora,實驗評估表明,它能夠在設備上生成與在高端GPU上運行的Open-Sora生成的高質量視頻相媲美的視頻。這些結果表明,On-device Sora能夠在資源受限的移動設備上實現高效且高質量的視頻生成,擴大了可訪問性,確保用戶隱私,減少對雲基礎設施的依賴,並降低相關成本。我們將所提出的On-device Sora視為向民主化最先進生成技術邁出的重要一步,實現了在普通移動和嵌入式設備上的視頻生成能力。代碼實現可在GitHub存儲庫上公開獲取:https://github.com/eai-lab/On-device-Sora。
English
We present On-device Sora, a first pioneering solution for diffusion-based
on-device text-to-video generation that operates efficiently on
smartphone-grade devices. Building on Open-Sora, On-device Sora applies three
novel techniques to address the challenges of diffusion-based text-to-video
generation on computation- and memory-limited mobile devices. First, Linear
Proportional Leap (LPL) reduces the excessive denoising steps required in video
diffusion through an efficient leap-based approach. Second, Temporal Dimension
Token Merging (TDTM) minimizes intensive token-processing computation in
attention layers by merging consecutive tokens along the temporal dimension.
Third, Concurrent Inference with Dynamic Loading (CI-DL) dynamically partitions
large models into smaller blocks and loads them into memory for concurrent
model inference, effectively addressing the challenges of limited device
memory. We implement On-device Sora on the iPhone 15 Pro, and the experimental
evaluations demonstrate that it is capable of generating high-quality videos on
the device, comparable to those produced by Open-Sora running on high-end GPUs.
These results show that On-device Sora enables efficient and high-quality video
generation on resource-constrained mobile devices, expanding accessibility,
ensuring user privacy, reducing dependence on cloud infrastructure, and
lowering associated costs. We envision the proposed On-device Sora as a
significant first step toward democratizing state-of-the-art generative
technologies, enabling video generation capabilities on commodity mobile and
embedded devices. The code implementation is publicly available at an GitHub
repository: https://github.com/eai-lab/On-device-Sora.Summary
AI-Generated Summary