ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

深度任意 V2
Depth Anything V2

Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao•Jun 13, 2024•10314

一幅图像价值超过16x16的块:探索变压器在单个像素上的应用
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Duy-Kien Nguyen, Mahmoud Assran, Unnat Jain, Martin R. Oswald, Cees G. M. Snoek, Xinlei Chen•Jun 13, 2024•522

Transformer遇见神经算法推理器
Transformers meet Neural Algorithmic Reasoners

Wilfried Bounsi, Borja Ibarz, Andrew Dudzik, Jessica B. Hamrick, Larisa Markeeva, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković•Jun 13, 2024•451

OpenVLA:一个开源的视觉-语言-动作模型
OpenVLA: An Open-Source Vision-Language-Action Model

Moo Jin Kim, Karl Pertsch, Siddharth Karamcheti, Ted Xiao, Ashwin Balakrishna, Suraj Nair, Rafael Rafailov, Ethan Foster, Grace Lam, Pannag Sanketi, Quan Vuong, Thomas Kollar, Benjamin Burchfiel, Russ Tedrake, Dorsa Sadigh, Sergey Levine, Percy Liang, Chelsea Finn•Jun 13, 2024•401

Samba:用于高效无限上下文语言建模的简单混合状态空间模型
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen•Jun 11, 2024•394

通过多分辨率扩散模型缓解图像生成中的失真问题
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh Chen•Jun 13, 2024•301

时间考验:评估大型语言模型在时间推理上的基准测试
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

Bahare Fatemi, Mehran Kazemi, Anton Tsitsulin, Karishma Malkan, Jinyeong Yim, John Palowitch, Sungyong Seo, Jonathan Halcrow, Bryan Perozzi•Jun 13, 2024•281

DiTFastAttn:扩散变压器模型的注意力压缩
DiTFastAttn: Attention Compression for Diffusion Transformer Models

Zhihang Yuan, Pu Lu, Hanling Zhang, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang•Jun 12, 2024•261

视觉素描板:作为多模态语言模型的视觉思维链的素描
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Yushi Hu, Weijia Shi, Xingyu Fu, Dan Roth, Mari Ostendorf, Luke Zettlemoyer, Noah A Smith, Ranjay Krishna•Jun 13, 2024•221

解释定制扩散模型的权重空间
Interpreting the Weight Space of Customized Diffusion Models

Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman•Jun 13, 2024•201

MuirBench:用于稳健多图像理解的全面基准。
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen•Jun 13, 2024•202

HelpSteer2:用于训练表现优异的奖励模型的开源数据集
HelpSteer2: Open-source dataset for training top-performing reward models

Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev•Jun 12, 2024•193

mOSCAR:一个大规模多语言和多模态的文档级语料库
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Matthieu Futeral, Armel Zebaze, Pedro Ortiz Suarez, Julien Abadji, Rémi Lacroix, Cordelia Schmid, Rachel Bawden, Benoît Sagot•Jun 13, 2024•164

CS-Bench:面向计算机科学掌握的大型语言模型综合基准
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu•Jun 12, 2024•164

4M-21:一种适用于数十种任务和模态的任意-任意视觉模型
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir•Jun 13, 2024•152

EMMA:您的文本到图像扩散模型可以秘密接受多模态提示
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang•Jun 13, 2024•143

探索规模化全模态预训练的极限
Explore the Limits of Omni-modal Pretraining at Scale

Yiyuan Zhang, Handong Li, Jing Liu, Xiangyu Yue•Jun 13, 2024•113

基于认知启发的能量驱动世界模型
Cognitively Inspired Energy-Based World Models

Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Aman Chadha, Jundong Li, Tariq Iqbal•Jun 13, 2024•107

Mistral-C2F:RLHF 中的分析和推理增强的粗到细的演员和有效合并的LLMs
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs

Chen Zheng, Ke Sun, Xun Zhou•Jun 12, 2024•102

常识-T2I挑战:文本到图像生成模型能否理解常识?
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth•Jun 11, 2024•91

TC-Bench:在文本到视频和图像到视频生成中对时间组合性进行基准测试
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang•Jun 12, 2024•81

Real3D:利用真实世界图像扩展大型重建模型
Real3D: Scaling Up Large Reconstruction Models with Real-World Images

Hanwen Jiang, Qixing Huang, Georgios Pavlakos•Jun 12, 2024•71

估计生成式人工智能的幻觉率
Estimating the Hallucination Rate of Generative AI

Andrew Jesson, Nicolas Beltran-Velez, Quentin Chu, Sweta Karlekar, Jannik Kossen, Yarin Gal, John P. Cunningham, David Blei•Jun 11, 2024•71

MLKV:用于内存高效Transformer解码的多层键-值头
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

Zayd Muhammad Kawakibi Zuhri, Muhammad Farid Adilazuarda, Ayu Purwarianti, Alham Fikri Aji•Jun 13, 2024•62

语言模型委员会:通过共识对高度主观任务上的基础模型进行基准测试
Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus

Justin Zhao, Flor Miriam Plaza-del-Arco, Amanda Cercas Curry•Jun 12, 2024•61

CVQA:具有跨文化多语言特点的视觉问答基准。
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song, Henok Biadglign Ademtew, Hernán Maina, Holy Lovenia, Israel Abebe Azime, Jan Christian Blaise Cruz, Jay Gala, Jiahui Geng, Jesus-German Ortiz-Barajas, Jinheon Baek, Jocelyn Dunstan, Laura Alonso Alemany, Kumaranage Ravindu Yasas Nagasinghe, Luciana Benotti, Luis Fernando D'Haro, Marcelo Viridiano, Marcos Estecha-Garitagoitia, Maria Camila Buitrago Cabrera, Mario Rodríguez-Cantelar, Mélanie Jouitteau, Mihail Mihaylov, Mohamed Fazli Mohamed Imam, Muhammad Farid Adilazuarda, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Naome Etori, Olivier Niyomugisha, Paula Mónica Silva, Pranjal Chitale, Raj Dabre, Rendi Chevi, Ruochen Zhang, Ryandito Diandaru, Samuel Cahyawijaya, Santiago Góngora, Soyeong Jeong, Sukannya Purkayastha, Tatsuki Kuribayashi, Thanmay Jayakumar, Tiago Timponi Torrent, Toqeer Ehsan, Vladimir Araujo, Yova Kementchedjhieva, Zara Burzo, Zheng Wei Lim, Zheng Xin Yong, Oana Ignat, Joan Nwatu, Rada Mihalcea, Thamar Solorio, Alham Fikri Aji•Jun 10, 2024•61

LRM-Zero:使用合成数据训练大型重建模型
LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan•Jun 13, 2024•51

通过模式插值理解扩散模型中的幻觉
Understanding Hallucinations in Diffusion Models through Mode Interpolation

Sumukh K Aithal, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter•Jun 13, 2024•51

CMC-Bench:走向视觉信号压缩的新范式
CMC-Bench: Towards a New Paradigm of Visual Signal Compression

Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin•Jun 13, 2024•52

Toffee:面向主题驱动的文本到图像生成的高效百万级数据集构建
Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun•Jun 13, 2024•52