Sora：大型视觉模型的背景、技术、局限性和机遇综述

摘要

Sora是一个文本到视频生成的AI模型，由OpenAI于2024年2月发布。该模型经过训练，能够根据文本指令生成逼真或想象的场景视频，并展现出模拟物理世界的潜力。本文基于公开的技术报告和逆向工程，全面审查了该模型的背景、相关技术、应用、尚存挑战以及文本到视频AI模型未来发展方向。我们首先追溯了Sora的发展历程，并研究了构建这个“世界模拟器”所使用的基础技术。然后，我们详细描述了Sora在从电影制作和教育到营销等多个行业中的应用和潜在影响。我们讨论了需要解决的主要挑战和限制，以便广泛部署Sora，例如确保视频生成的安全和公正性。最后，我们讨论了Sora和视频生成模型未来的发展，以及该领域的进步如何能够促进人工智能与人类之间新的互动方式，提升视频生成的生产力和创造力。

English

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The model is trained to generate videos of realistic or imaginative scenes from text instructions and show potential in simulating the physical world. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model's background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models. We first trace Sora's development and investigate the underlying technologies used to build this "world simulator". Then, we describe in detail the applications and potential impact of Sora in multiple industries ranging from film-making and education to marketing. We discuss the main challenges and limitations that need to be addressed to widely deploy Sora, such as ensuring safe and unbiased video generation. Lastly, we discuss the future development of Sora and video generation models in general, and how advancements in the field could enable new ways of human-AI interaction, boosting productivity and creativity of video generation.

Sora：大型视觉模型的背景、技术、局限性和机遇综述

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

摘要

Support