ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

多模态LLM预训练的方法、分析和见解
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang•Mar 14, 2024•12812

Quiet-STaR:语言模型可以自我教导在言语之前进行思考
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman•Mar 14, 2024•787

利用WebSight数据集解锁将Web截图转换为HTML代码
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Hugo Laurençon, Léo Tronchon, Victor Sanh•Mar 14, 2024•564

通向通用视觉Transformer的GiT:通过通用语言接口
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang•Mar 14, 2024•2811

StreamMultiDiffusion:基于区域语义控制的实时交互式生成
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee•Mar 14, 2024•273

通过分解扩散蒸馏进行视频编辑。
Video Editing via Factorized Diffusion Distillation

Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman•Mar 14, 2024•242

BurstAttention:一种针对极长序列的高效分布式注意力框架
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun, Shengnan Wang, Teng Su•Mar 14, 2024•232

Glyph-ByT5:用于准确视觉文本渲染的定制文本编码器
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan•Mar 14, 2024•181

Griffon v2:通过高分辨率缩放和视觉-语言共指推进多模态感知
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao Wang•Mar 14, 2024•163

视频曼巴套件:状态空间模型作为视频理解的多功能替代方案
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wang•Mar 14, 2024•151

3D-VLA:一种3D视觉-语言-动作生成世界模型
3D-VLA: A 3D Vision-Language-Action Generative World Model

Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan•Mar 14, 2024•101

VisionGPT-3D:用于增强3D视觉理解的通用多模态代理
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang, Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zou•Mar 14, 2024•101

Veagle:多模态表示学习的进展
Veagle: Advancements in Multimodal Representation Learning

Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola•Jan 18, 2024•101

LocalMamba:带有窗口选择扫描的视觉状态空间模型
LocalMamba: Visual State Space Model with Windowed Selective Scan

Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu•Mar 14, 2024•91