ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: hi@pomodiary.com

WeChat: jiujiaoxieeba

X (Twitter)Discord

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

1

Gemini 1.5:在数百万标记的上下文中实现多模态理解
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Mar 8
ByMachel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, James Molloy, Jilin Chen, Michael Isard, Paul Barham, Tom Hennigan, Ross McIlroy, Melvin Johnson, Johan Schalkwyk, Eli Collins, Eliza Rutherford, Erica Moreira, Kareem Ayoub, Megha Goel, Clemens Meyer, Gregory Thornton, Zhen Yang, Henryk Michalewski, Zaheer Abbas, Nathan Schucher, Ankesh Anand, Richard Ives, James Keeling, Karel Lenc, Salem Haykal, Siamak Shakeri, Pranav Shyam, Aakanksha Chowdhery, Roman Ring, Stephen Spencer, Eren Sezener, Luke Vilnis, Oscar Chang, Nobuyuki Morioka, George Tucker, Ce Zheng, Oliver Woodman, Nithya Attaluri, Tomas Kocisky, Evgenii Eltyshev, Xi Chen, Timothy Chung, Vittorio Selo, Siddhartha Brahma, Petko Georgiev, Ambrose Slone, Zhenkai Zhu, James Lottes, Siyuan Qiao, Ben Caine, Sebastian Riedel, Alex Tomala, Martin Chadwick, Juliette Love, Peter Choy, Sid Mittal, Neil Houlsby, Yunhao Tang, Matthew Lamm, Libin Bai, Qiao Zhang, Luheng He, Yong Cheng, Peter Humphreys, Yujia Li, Sergey Brin, Albin Cassirer, Yingjie Miao, Lukas Zilka, Taylor Tobin, Kelvin Xu, Lev Proleev, Daniel Sohn, Alberto Magni, Lisa Anne Hendricks, Isabel Gao, Santiago Ontañón, Oskar Bunyan, Nathan Byrd, Abhanshu Sharma, Biao Zhang, Mario Pinto, Rishika Sinha, Harsh Mehta, Dawei Jia, Sergi Caelles, Albert Webson, Alex Morris, Becca Roelofs, Yifan Ding, Robin Strudel, Xuehan Xiong, Marvin Ritter, Mostafa Dehghani, Rahma Chaabouni, Abhijit Karmarkar, Guangda Lai, Fabian Mentzer, Bibo Xu, YaGuang Li, Yujing Zhang, Tom Le Paine, Alex Goldin, Behnam Neyshabur, Kate Baumli, Anselm Levskaya, Michael Laskin, Wenhao Jia, Jack W. Rae, Kefan Xiao, Antoine He, Skye Giordano, Lakshman Yagati, Jean-Baptiste Lespiau, Paul Natsev, Sanjay Ganapathy, Fangyu Liu, Danilo Martins, Nanxin Chen, Yunhan Xu, Megan Barnes, Rhys May, Arpi Vezer, Junhyuk Oh, Ken Franko, Sophie Bridgers, Ruizhe Zhao, Boxi Wu, Basil Mustafa, Sean Sechrist, Emilio Parisotto, Thanumalayan Sankaranarayana Pillai, Chris Larkin, Chenjie Gu, Christina Sorokin, Maxim Krikun, Alexey Guseynov, Jessica Landon, Romina Datta, Alexander Pritzel, Phoebe Thacker, Fan Yang, Kevin Hui, Anja Hauth, Chih-Kuan Yeh, David Barker, Justin Mao-Jones, Sophia Austin, Hannah Sheahan, Parker Schuh, James Svensson, Rohan Jain, Vinay Ramasesh, Anton Briukhov, Da-Woon Chung, Tamara von Glehn, Christina Butterfield, Priya Jhakra, Matthew Wiethoff, Justin Frye, Jordan Grimstad, Beer Changpinyo, Charline Le Lan, Anna Bortsova, Yonghui Wu, Paul Voigtlaender, Tara Sainath, Charlotte Smith, Will Hawkins, Kris Cao, James Besley, Srivatsan Srinivasan, Mark Omernick, Colin Gaffney, Gabriela Surita, Ryan Burnell, Bogdan Damoc, Junwhan Ahn, Andrew Brock, Mantas Pajarskas, Anastasia Petrushkina, Seb Noury, Lorenzo Blanco, Kevin Swersky, Arun Ahuja, Thi Avrahami, Vedant Misra, Raoul de Liedekerke, Mariko Iinuma, Alex Polozov, Sarah York, George van den Driessche, Paul Michel, Justin Chiu, Rory Blevins, Zach Gleicher, Adrià Recasens, Alban Rrustemi, Elena Gribovskaya, Aurko Roy, Wiktor Gworek, Séb Arnold, Lisa Lee, James Lee-Thorp, Marcello Maggioni, Enrique Piqueras, Kartikeya Badola, Sharad Vikram, Lucas Gonzalez, Anirudh Baddepudi, Evan Senter, Jacob Devlin, James Qin, Michael Azzam, Maja Trebacz, Martin Polacek, Kashyap Krishnakumar, Shuo-yiin Chang, Matthew Tung, Ivo Penchev, Rishabh Joshi, Kate Olszewska, Carrie Muir, Mateo Wirth, Ale Jakse Hartman, Josh Newlan, Sheleem Kashem, Vijay Bolina, Elahe Dabir, Joost van Amersfoort, Zafarali Ahmed, James Cobon-Kerr, Aishwarya Kamath, Arnar Mar Hrafnkelsson, Le Hou, Ian Mackinnon, Alexandre Frechette, Eric Noland, Xiance Si, Emanuel Taropa, Dong Li, Phil Crone, Anmol Gulati, Sébastien Cevey, Jonas Adler, Ada Ma, David Silver, Simon Tokumine, Richard Powell, Stephan Lee, Michael Chang, Samer Hassan, Diana Mincu, Antoine Yang, Nir Levine, Jenny Brennan, Mingqiu Wang, Sarah Hodkinson, Jeffrey Zhao, Josh Lipschultz, Aedan Pope, Michael B. Chang, Cheng Li, Laurent El Shafey, Michela Paganini, Sholto Douglas, Bernd Bohnet, Fabio Pardo, Seth Odoom, Mihaela Rosca, Cicero Nogueira dos Santos, Kedar Soparkar, Arthur Guez, Tom Hudson, Steven Hansen, Chulayuth Asawaroengchai, Ravi Addanki, Tianhe Yu, Wojciech Stokowiec, Mina Khan, Justin Gilmer, Jaehoon Lee, Carrie Grimes Bostock, Keran Rong, Jonathan Caton, Pedram Pejman, Filip Pavetic, Geoff Brown, Vivek Sharma, Mario Lučić, Rajkumar Samuel, Josip Djolonga, Amol Mandhane, Lars Lowe Sjösund, Elena Buchatskaya, Elspeth White, Natalie Clay, Jiepu Jiang, Hyeontaek Lim, Ross Hemsley, Jane Labanowski, Nicola De Cao, David Steiner, Sayed Hadi Hashemi, Jacob Austin, Anita Gergely, Tim Blyth, Joe Stanton, Kaushik Shivakumar, Aditya Siddhant, Anders Andreassen, Carlos Araya, Nikhil Sethi, Rakesh Shivanna, Steven Hand, Ankur Bapna, Ali Khodaei, Antoine Miech, Garrett Tanzer, Andy Swing, Shantanu Thakoor, Zhufeng Pan, Zachary Nado, Stephanie Winkler, Dian Yu, Mohammad Saleh, Loren Maggiore, Iain Barr, Minh Giang, Thais Kagohara, Ivo Danihelka, Amit Marathe, Vladimir Feinberg, Mohamed Elhawaty, Nimesh Ghelani, Dan Horgan, Helen Miller, Lexi Walker, Richard Tanburn, Mukarram Tariq, Disha Shrivastava, Fei Xia, Chung-Cheng Chiu, Zoe Ashwood, Khuslen Baatarsukh, Sina Samangooei, Fred Alcober, Axel Stjerngren, Paul Komarek, Katerina Tsihlas, Anudhyan Boral, Ramona Comanescu, Jeremy Chen, Ruibo Liu, Dawn Bloxwich, Charlie Chen, Yanhua Sun, Fangxiaoyu Feng, Matthew Mauger, Xerxes Dotiwalla, Vincent Hellendoorn, Michael Sharman, Ivy Zheng, Krishna Haridasan, Gabe Barth-Maron, Craig Swanson, Dominika Rogozińska, Alek Andreev, Paul Kishan Rubenstein, Ruoxin Sang, Dan Hurt, Gamaleldin Elsayed, Renshen Wang, Dave Lacey, Anastasija Ilić, Yao Zhao, Lora Aroyo, Chimezie Iwuanyanwu, Vitaly Nikolaev, Balaji Lakshminarayanan, Sadegh Jazayeri, Raphaël Lopez Kaufman, Mani Varadarajan, Chetan Tekur, Doug Fritz, Misha Khalman, David Reitter, Kingshuk Dasgupta, Shourya Sarcar, Tina Ornduff, Javier Snaider, Fantine Huot, Johnson Jia, Rupert Kemp, Nejc Trdin, Anitha Vijayakumar, Lucy Kim, Christof Angermueller, Li Lao, Tianqi Liu, Haibin Zhang, David Engel, Somer Greene, Anaïs White, Jessica Austin, Lilly Taylor, Shereen Ashraf, Dangyi Liu, Maria Georgaki, Irene Cai, Yana Kulizhskaya, Sonam Goenka, Brennan Saeta, Kiran Vodrahalli, Christian Frank, Dario de Cesare, Brona Robenek, Harry Richardson, Mahmoud Alnahlawi, Christopher Yew, Priya Ponnapalli, Marco Tagliasacchi, Alex Korchemniy, Yelin Kim, Dinghua Li, Bill Rosgen, Zoe Ashwood, Kyle Levin, Jeremy Wiesner, Praseem Banzal, Praveen Srinivasan, Hongkun Yu, Çağlar Ünlü, David Reid, Zora Tung, Daniel Finchelstein, Ravin Kumar, Andre Elisseeff, Jin Huang, Ming Zhang, Rui Zhu, Ricardo Aguilar, Mai Giménez, Jiawei Xia, Olivier Dousse, Willi Gierke, Soheil Hassas Yeganeh, Damion Yates, Komal Jalan, Lu Li, Eri Latorre-Chimoto, Duc Dung Nguyen, Ken Durden, Praveen Kallakuri, Yaxin Liu, Matthew Johnson, Tomy Tsai, Alice Talbert, Jasmine Liu, Alexander Neitz, Chen Elkind, Marco Selvi, Mimi Jasarevic, Livio Baldini Soares, Albert Cui, Pidong Wang, Alek Wenjiao Wang, Xinyu Ye, Krystal Kallarackal, Lucia Loher, Hoi Lam, Josef Broder, Dan Holtmann-Rice, Nina Martin, Bramandia Ramadhana, Daniel Toyama, Mrinal Shukla, Sujoy Basu, Abhi Mohan, Nick Fernando, Noah Fiedel, Kim Paterson, Hui Li, Ankush Garg, Jane Park, DongHyun Choi, Diane Wu, Sankalp Singh, Zhishuai Zhang, Amir Globerson, Lily Yu, John Carpenter, Félix de Chaumont Quitry, Carey Radebaugh, Chu-Cheng Lin, Alex Tudor, Prakash Shroff, Drew Garmon, Dayou Du, Neera Vats, Han Lu, Shariq Iqbal, Alex Yakubovich, Nilesh Tripuraneni, James Manyika, Haroon Qureshi, Nan Hua, Christel Ngani, Maria Abi Raad, Hannah Forbes, Anna Bulanova, Jeff Stanway, Mukund Sundararajan, Victor Ungureanu, Colton Bishop, Yunjie Li, Balaji Venkatraman, Bo Li, Chloe Thornton, Salvatore Scellato, Nishesh Gupta, Yicheng Wang, Ian Tenney, Xihui Wu, Ashish Shenoy, Gabriel Carvajal, Diana Gage Wright, Ben Bariach, Zhuyun Xiao, Peter Hawkins, Sid Dalmia, Clement Farabet, Pedro Valenzuela, Quan Yuan, Chris Welty, Ananth Agarwal, Mia Chen, Wooyeol Kim, Brice Hulse, Nandita Dukkipati, Adam Paszke, Andrew Bolt, Elnaz Davoodi, Kiam Choo, Jennifer Beattie, Jennifer Prendki, Harsha Vashisht, Rebeca Santamaria-Fernandez, Luis C. Cobo, Jarek Wilkiewicz, David Madras, Ali Elqursh, Grant Uy, Kevin Ramirez, Matt Harvey, Tyler Liechty, Heiga Zen, Jeff Seibert, Clara Huiyi Hu, Mohamed Elhawaty, Andrey Khorlin, Maigo Le, Asaf Aharoni, Megan Li, Lily Wang, Sandeep Kumar, Alejandro Lince, Norman Casagrande, Jay Hoover, Dalia El Badawy, David Soergel, Denis Vnukov, Matt Miecnikowski, Jiri Simsa, Anna Koop, Praveen Kumar, Thibault Sellam, Daniel Vlasic, Samira Daruki, Nir Shabat, John Zhang, Guolong Su, Jiageng Zhang, Jeremiah Liu, Yi Sun, Evan Palmer, Alireza Ghaffarkhah, Xi Xiong, Victor Cotruta, Michael Fink, Lucas Dixon, Ashwin Sreevatsa, Adrian Goedeckemeyer, Alek Dimitriev, Mohsen Jafari, Remi Crocker, Nicholas FitzGerald, Aviral Kumar, Sanjay Ghemawat, Ivan Philips, Frederick Liu, Yannie Liang, Rachel Sterneck, Alena Repina, Marcus Wu, Laura Knight, Marin Georgiev, Hyo Lee, Harry Askham, Abhishek Chakladar, Annie Louis, Carl Crous, Hardie Cate, Dessie Petrova, Michael Quinn, Denese Owusu-Afriyie, Achintya Singhal, Nan Wei, Solomon Kim, Damien Vincent, Milad Nasr, Christopher A. Choquette-Choo, Reiko Tojo, Shawn Lu, Diego de Las Casas, Yuchung Cheng, Tolga Bolukbasi, Katherine Lee, Saaber Fatehi, Rajagopal Ananthanarayanan, Miteyan Patel, Charbel Kaed, Jing Li, Jakub Sygnowski, Shreyas Rammohan Belle, Zhe Chen, Jaclyn Konzelmann, Siim Põder, Roopal Garg, Vinod Koverkathu, Adam Brown, Chris Dyer, Rosanne Liu, Azade Nova, Jun Xu, Slav Petrov, Demis Hassabis, Koray Kavukcuoglu, Jeffrey Dean, Oriol Vinyals
66
6

在本报告中,我们介绍Gemini家族的最新模型Gemini 1.5 Pro,这是一个高度计算效率的多模式专家混合模型,能够回忆和推理来自数百万标记的上下文中的细粒度信息,包括多个长文档和几小时的视频和音频。Gemini 1.5 Pro在跨模态的长上下文检索任务中实现了接近完美的召回率,改进了长文档问答、长视频问答和长上下文ASR的最新技术水平,并在广泛的基准测试中与Gemini 1.0 Ultra的最新技术水平相匹敌甚至超越。通过研究Gemini 1.5 Pro在长上下文能力方面的极限,我们发现在至少1000万标记的情况下,下一个标记预测和接近完美的检索(>99%)持续改进,这是对现有模型(如Claude 2.1(20万)和GPT-4 Turbo(12.8万))的一次世代性飞跃。最后,我们强调了大型语言模型在前沿的惊人新能力;当为Kalamang语的语法手册提供时,这是一种全球使用者不到200人的语言,模型学会将英语翻译成Kalamang的水平与从相同内容学习的人类相似。

2

DeepSeek-VL:走向真实世界的视觉-语言理解
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Mar 8
ByHaoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan
46
4

我们介绍了DeepSeek-VL,这是一个专为现实世界视觉与语言理解应用而设计的开源视觉-语言(VL)模型。我们的方法围绕三个关键维度展开: 我们努力确保我们的数据具有多样性、可扩展性,并广泛涵盖包括网页截图、PDF、OCR、图表和基于知识的内容在内的现实场景,旨在全面表征实际背景。此外,我们从真实用户场景创建了用例分类法,并相应构建了一个指导微调数据集。使用这个数据集进行微调显著提高了模型在实际应用中的用户体验。考虑到效率和大多数现实场景的需求,DeepSeek-VL集成了一个混合视觉编码器,可以高效处理高分辨率图像(1024 x 1024),同时保持相对较低的计算开销。这种设计选择确保了模型能够在各种视觉任务中捕获关键语义和详细信息。 我们认为,一个熟练的视觉-语言模型首先应具备强大的语言能力。为了确保在预训练期间保留LLM能力,我们研究了一种有效的VL预训练策略,通过从一开始就整合LLM训练,并仔细管理视觉和语言模态之间观察到的竞争动态。 DeepSeek-VL系列(包括1.3B和7B模型)在现实世界应用中作为视觉-语言聊天机器人展示出卓越的用户体验,在相同模型大小的情况下实现了一流或有竞争力的性能,同时在以语言为中心的基准测试中表现出稳健的性能。我们已经公开了1.3B和7B模型,以促进基于这一基础模型的创新。

3

ELLA:为增强语义对齐而为扩散模型配备LLM
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Mar 8
ByXiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu
45
2

扩散模型在文本到图像生成领域展现出了卓越的性能。然而,大多数广泛使用的模型仍然采用CLIP作为它们的文本编码器,这限制了它们理解密集提示、涵盖多个对象、详细属性、复杂关系、长文本对齐等能力。本文介绍了一种高效的大型语言模型适配器,称为ELLA,它为文本到图像扩散模型配备了强大的大型语言模型(LLM),以增强文本对齐,而无需对U-Net或LLM进行训练。为了无缝连接两个预训练模型,我们研究了一系列语义对齐连接器设计,并提出了一种新颖的模块,即时间步感知语义连接器(TSC),它可以动态地从LLM中提取时间步相关条件。我们的方法在去噪过程的不同阶段调整语义特征,帮助扩散模型在采样时间步上解释冗长和复杂的提示。此外,ELLA可以轻松与社区模型和工具结合,以提高它们的提示跟随能力。为了评估在密集提示跟随方面的文本到图像模型,我们引入了密集提示图基准(DPG-Bench),这是一个包含1K密集提示的具有挑战性的基准。广泛的实验表明,ELLA在密集提示跟随方面优于最先进的方法,特别是在涉及多个对象组合、不同属性和关系的情况下。

4

通过图神经网络在Spotify实现个性化有声读物推荐
Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks

Mar 8
ByMarco De Nadai, Francesco Fabbri, Paul Gigioli, Alice Wang, Ang Li, Fabrizio Silvestri, Laura Kim, Shawn Lin, Vladan Radosavljevic, Sandeep Ghael, David Nyhan, Hugues Bouchard, Mounia Lalmas-Roelleke, Andreas Damianou
25
1

在不断发展的数字音频领域中,以其音乐和谈话内容而闻名的Spotify最近向其庞大用户群引入了有声读物。尽管前景看好,但这一举措给个性化推荐带来了重大挑战。与音乐和播客不同,最初需要付费获取的有声读物在购买前无法轻松浏览,这使得推荐的相关性面临更高的风险。此外,将新的内容类型引入现有平台会面临极端的数据稀疏性,因为大多数用户对这种新内容类型并不熟悉。最后,向数百万用户推荐内容要求模型反应迅速且具有可扩展性。为了解决这些挑战,我们利用播客和音乐用户偏好,引入了2T-HGNN,这是一个包含异质图神经网络(HGNNs)和双塔(2T)模型的可扩展推荐系统。这种新颖方法揭示了项目之间微妙的关系,同时确保了低延迟和复杂性。我们将用户从HGNN图中分离出来,并提出了一种创新的多链接邻居采样器。这些选择,连同双塔组件,显著降低了HGNN模型的复杂性。涉及数百万用户的实证评估显示,在个性化推荐质量方面取得了显著改善,导致新有声读物的启动率增加了46%,流媒体率提升了23%。有趣的是,我们的模型影响不仅限于有声读物,还使得像播客这样的成熟产品受益。

5

CogView3:通过中继扩散实现更精细更快速的文本到图像生成。
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Mar 8
ByWendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang
24
3

最近文本到图像生成系统的进展主要受到扩散模型的推动。然而,单阶段文本到图像扩散模型仍然面临着计算效率和图像细节精炼方面的挑战。为了解决这一问题,我们提出了CogView3,这是一种创新的级联框架,可以提升文本到图像扩散的性能。CogView3是第一个在文本到图像生成领域实现中继扩散的模型,通过首先创建低分辨率图像,然后应用基于中继的超分辨率来执行任务。这种方法不仅产生了具有竞争力的文本到图像输出,而且极大地减少了训练和推断成本。我们的实验结果表明,CogView3在人类评估中比当前最先进的开源文本到图像扩散模型SDXL表现提高了77.0\%,同时仅需要大约1/2的推断时间。CogView3的精简变体在仅利用SDXL推断时间的1/10的情况下实现了可比的性能。

6

CRM:使用卷积重建的单图像到3D纹理网格模型
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

Mar 8
ByZhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu
22
2

前馈3D生成模型,如大型重建模型(LRM),已经展示出出色的生成速度。然而,基于Transformer的方法并未利用其架构中三平面组件的几何先验,这常常导致在3D数据规模有限且训练缓慢的情况下质量不佳。在这项工作中,我们提出了卷积重建模型(CRM),这是一个高保真的前馈单图像到3D生成模型。鉴于稀疏3D数据带来的限制,我们强调了将几何先验整合到网络设计中的必要性。CRM基于一个关键观察,即三平面的可视化展示出六个正交图像的空间对应关系。首先,它从单个输入图像生成六个正交视图图像,然后将这些图像馈送到卷积U-Net中,利用其强大的像素级对齐能力和显著的带宽,创建高分辨率的三平面。CRM进一步采用Flexicubes作为几何表示,有助于在纹理网格上进行直接端到端的优化。总体而言,我们的模型仅需10秒就能从图像中生成高保真纹理网格,无需任何测试时优化。

7

视频电梯:利用多功能文本到图像扩散模型提升视频生成质量
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Mar 8
ByYabo Zhang, Yuxiang Wei, Xianhui Lin, Zheng Hui, Peiran Ren, Xuansong Xie, Xiangyang Ji, Wangmeng Zuo
21
1

文本到图像扩散模型(T2I)展示了在创建逼真和美学图像方面的前所未有能力。相比之下,文本到视频扩散模型(T2V)在帧质量和文本对齐方面仍然远远落后,这归因于训练视频的质量和数量不足。在本文中,我们介绍了VideoElevator,这是一种无需训练且即插即用的方法,利用T2I的卓越能力提升了T2V的性能。与传统的T2V采样(即时间和空间建模)不同,VideoElevator明确将每个采样步骤分解为时间运动细化和空间质量提升。具体而言,时间运动细化利用封装的T2V来增强时间一致性,然后反转为T2I所需的噪声分布。然后,空间质量提升利用膨胀的T2I直接预测更少噪声的潜在值,增加更多照片逼真的细节。我们在各种T2V和T2I的组合下进行了广泛的实验。结果显示,VideoElevator不仅改善了具有基础T2I的T2V基线的性能,还促进了具有个性化T2I的风格化视频合成。我们的代码可在https://github.com/YBYBZhang/VideoElevator找到。

3月8日
3月11日
3月12日