ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

MMDU:用于LVLMs的多轮多图像对话理解基准和指令调优数据集
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Ziyu Liu, Tao Chu, Yuhang Zang, Xilin Wei, Xiaoyi Dong, Pan Zhang, Zijian Liang, Yuanjun Xiong, Yu Qiao, Dahua Lin, Jiaqi Wang•Jun 17, 2024•646

DataComp-LM:寻找下一代语言模型的训练集
DataComp-LM: In search of the next generation of training sets for language models

Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, Hanlin Zhang, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldani, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar•Jun 17, 2024•534

mDPO:多模态大型语言模型的条件偏好优化
mDPO: Conditional Preference Optimization for Multimodal Large Language Models

Fei Wang, Wenxuan Zhou, James Y. Huang, Nan Xu, Sheng Zhang, Hoifung Poon, Muhao Chen•Jun 17, 2024•391

THEANINE:通过时间轴增强响应生成重新审视长期对话中的记忆管理
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

Seo Hyun Kim, Kai Tzu-iunn Ong, Taeyoon Kwon, Namyoung Kim, Keummin Ka, SeongHyeon Bae, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo•Jun 16, 2024•351

MeshAnything:使用自回归Transformer进行艺术家创建的网格生成
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Yiwen Chen, Tong He, Di Huang, Weicai Ye, Sijin Chen, Jiaxiang Tang, Xin Chen, Zhongang Cai, Lei Yang, Gang Yu, Guosheng Lin, Chi Zhang•Jun 14, 2024•342

大型语言模型在预训练过程中如何获取事实知识?
How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo•Jun 17, 2024•321

VideoLLM-online:用于流媒体视频的在线视频大型语言模型
VideoLLM-online: Online Video Large Language Model for Streaming Video

Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou•Jun 17, 2024•251

一种简单且有效的基于L2范数的KV缓存压缩策略
A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression

Alessio Devoto, Yu Zhao, Simone Scardapane, Pasquale Minervini•Jun 17, 2024•243

探讨大型语言模型在提示编码中对扩散模型的作用
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu•Jun 17, 2024•224

MINT-1T:将开源多模态数据扩展10倍:一个包含万亿标记的多模态数据集
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Anas Awadalla, Le Xue, Oscar Lo, Manli Shu, Hannah Lee, Etash Kumar Guha, Matt Jordan, Sheng Shen, Mohamed Awadalla, Silvio Savarese, Caiming Xiong, Ran Xu, Yejin Choi, Ludwig Schmidt•Jun 17, 2024•211

GAMA:具有先进音频理解和复杂推理能力的大型音频语言模型
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha•Jun 17, 2024•201

LLaNA:大型语言和NeRF助手
LLaNA: Large Language and NeRF Assistant

Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano•Jun 17, 2024•183

从像素到文本:一个大规模的密集图像描述数据集
From Pixels to Prose: A Large Dataset of Dense Image Captions

Vasu Singla, Kaiyu Yue, Sukriti Paul, Reza Shirkavand, Mayuka Jayawardhana, Alireza Ganjdanesh, Heng Huang, Abhinav Bhatele, Gowthami Somepalli, Tom Goldstein•Jun 14, 2024•182

WPO:利用加权偏好优化增强RLHF
WPO: Enhancing RLHF with Weighted Preference Optimization

Wenxuan Zhou, Ravi Agrawal, Shujian Zhang, Sathish Reddy Indurthi, Sanqiang Zhao, Kaiqiang Song, Silei Xu, Chenguang Zhu•Jun 17, 2024•151

上下文编辑:从自我诱导分布中学习知识
In-Context Editing: Learning Knowledge from Self-Induced Distributions

Siyuan Qi, Bangcheng Yang, Kailin Jiang, Xiaobo Wang, Jiaqi Li, Yifan Zhong, Yaodong Yang, Zilong Zheng•Jun 17, 2024•155

Pandora:通过自然语言操作和视频状态实现通用世界模型
Pandora: Towards General World Model with Natural Language Actions and Video States

Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu•Jun 12, 2024•151

WildVision:利用人类偏好在真实环境中评估视觉-语言模型
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin•Jun 16, 2024•144

L4GM:大型4D高斯重建模型
L4GM: Large 4D Gaussian Reconstruction Model

Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling•Jun 14, 2024•131

通过文档截图嵌入实现多模态检索的统一化
Unifying Multimodal Retrieval via Document Screenshot Embedding

Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin•Jun 17, 2024•101

任务问我任何事
Task Me Anything

Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna•Jun 17, 2024•81

Vid3D:使用2D视频扩散合成动态3D场景
Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

Rishab Parthasarathy, Zack Ankner, Aaron Gokaslan•Jun 17, 2024•81

神经网络在实践中到底有多灵活?
Just How Flexible are Neural Networks in Practice?

Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson•Jun 17, 2024•71

人工智能中的幽默:大规模众包偏好和基准,用于卡通字幕。
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Jifan Zhang, Lalit Jain, Yang Guo, Jiayi Chen, Kuan Lok Zhou, Siddharth Suresh, Andrew Wagenmaker, Scott Sievert, Timothy Rogers, Kevin Jamieson, Robert Mankoff, Robert Nowak•Jun 15, 2024•72

在任务类型、应用领域和推理类型之间评估开放式语言模型:一项深入的实验分析
Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis

Neelabh Sinha, Vinija Jain, Aman Chadha•Jun 17, 2024•61

突破注意力瓶颈
Breaking the Attention Bottleneck

Kalle Hilsenbek•Jun 16, 2024•44

HiddenTables & PyQTax:一个合作游戏和数据集,用于表格问答(TableQA),以确保在众多分类法中实现规模和数据隐私。
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies

William Watson, Nicole Cho, Tucker Balch, Manuela Veloso•Jun 16, 2024•41

CoLoR-Filter:用于目标语言模型预训练的条件损失减少过滤器
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training

David Brandfonbrener, Hanlin Zhang, Andreas Kirsch, Jonathan Richard Schwarz, Sham Kakade•Jun 15, 2024•41

一致性的平方:具有潜在一致性模型的一致且快速的3D绘画
Consistency^2: Consistent and Fast 3D Painting with Latent Consistency Models

Tianfu Wang, Anton Obukhov, Konrad Schindler•Jun 17, 2024•31

大规模语言模型中偏好建模的深度贝叶斯主动学习
Deep Bayesian Active Learning for Preference Modeling in Large Language Models

Luckeciano C. Melo, Panagiotis Tigas, Alessandro Abate, Yarin Gal•Jun 14, 2024•21