ChatPaper.aiChatPaper.ai
首頁

arXiv

HuggingFace

定價賬戶工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究論文每日精選

每日精選AI研究論文及翻譯

MM1:多模態LLM預訓練的方法、分析和見解
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier, Sam Dodge, Bowen Zhang, Philipp Dufter, Dhruti Shah, Xianzhi Du, Futang Peng, Floris Weers, Anton Belyi, Haotian Zhang, Karanjeet Singh, Doug Kang, Hongyu Hè, Max Schwarzer, Tom Gunter, Xiang Kong, Aonan Zhang, Jianyu Wang, Chong Wang, Nan Du, Tao Lei, Sam Wiseman, Guoli Yin, Mark Lee, Zirui Wang, Ruoming Pang, Peter Grasch, Alexander Toshev, Yinfei Yang•Mar 14, 2024•12812

Quiet-STaR:語言模型可以自我教導在說話之前思考
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman•Mar 14, 2024•787

利用WebSight數據集解鎖將Web截圖轉換為HTML代碼
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Hugo Laurençon, Léo Tronchon, Victor Sanh•Mar 14, 2024•564

通過通用語言界面實現通用視覺Transformer:GiT
GiT: Towards Generalist Vision Transformer through Universal Language Interface

Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang•Mar 14, 2024•2811

StreamMultiDiffusion:基於區域語義控制的即時互動生成
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee•Mar 14, 2024•273

透過分解擴散蒸餾進行影片編輯
Video Editing via Factorized Diffusion Distillation

Uriel Singer, Amit Zohar, Yuval Kirstain, Shelly Sheynin, Adam Polyak, Devi Parikh, Yaniv Taigman•Mar 14, 2024•242

BurstAttention:一個針對極長序列的高效分佈式注意力框架
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun, Shengnan Wang, Teng Su•Mar 14, 2024•232

Glyph-ByT5:一種針對精確視覺文本呈現的自定義文本編碼器
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan•Mar 14, 2024•181

Griffon v2:透過高解析度縮放和視覺-語言共指推進多模式感知。
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

Yufei Zhan, Yousong Zhu, Hongyin Zhao, Fan Yang, Ming Tang, Jinqiao Wang•Mar 14, 2024•163

視訊瑪巴套件:狀態空間模型作為視訊理解的多功能替代方案
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wang•Mar 14, 2024•151

3D-VLA:一個3D視覺-語言-動作生成世界模型
3D-VLA: A 3D Vision-Language-Action Generative World Model

Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan•Mar 14, 2024•101

VisionGPT-3D:一個通用的多模式代理,用於增強3D視覺理解。
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang, Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zou•Mar 14, 2024•101

Veagle:多模態表示學習的進展
Veagle: Advancements in Multimodal Representation Learning

Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola•Jan 18, 2024•101

LocalMamba:具有窗口式選擇掃描的視覺狀態空間模型
LocalMamba: Visual State Space Model with Windowed Selective Scan

Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu•Mar 14, 2024•91