ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

Magpie:通过提示对齐的LLMs从零开始进行对齐数据综合
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin•Jun 12, 2024•705

NaRCan:自然细化规范图像与扩散整合先验用于视频编辑
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing

Ting-Hsuan Chen, Jiewen Chan, Hau-Shiang Shiu, Shih-Han Yen, Chang-Han Yeh, Yu-Lun Liu•Jun 10, 2024•532

如果我们使用LLaMA-3重新为数十亿张网络图片添加标题会发生什么?
What If We Recaption Billions of Web Images with LLaMA-3?

Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie•Jun 12, 2024•421

MotionClone:无需训练的可控视频生成运动克隆
MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Pengyang Ling, Jiazi Bu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin•Jun 8, 2024•424

Physics3D:通过视频扩散学习3D高斯物理特性
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Fangfu Liu, Hanyang Wang, Shunyu Yao, Shengjun Zhang, Jie Zhou, Yueqi Duan•Jun 6, 2024•404

我们是否已经完成了MMLU?
Are We Done with MMLU?

Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini•Jun 6, 2024•401

PowerInfer-2:智能手机上快速大型语言模型推理
PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Zhenliang Xue, Yixin Song, Zeyu Mi, Le Chen, Yubin Xia, Haibo Chen•Jun 10, 2024•395

VideoLLaMA 2:在视频LLMs中推进时空建模和音频理解
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing•Jun 11, 2024•382

3D-GRAND:一个百万规模的用于具有更好 grounding 和更少幻觉的 3D-LLMs 的数据集
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai•Jun 7, 2024•312

MMWorld:面向视频中多学科多方面世界模型评估
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang•Jun 12, 2024•290

Turbo Sparse:通过最小激活参数实现LLM SOTA性能
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Yixin Song, Haotong Xie, Zhengyan Zhang, Bo Wen, Li Ma, Zeyu Mi, Haibo Chen•Jun 10, 2024•282

FontStudio:用于生成连贯一致的字体效果的形状自适应扩散模型
FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Xinzhi Mu, Li Chen, Bohan Chen, Shuyang Gu, Jianmin Bao, Dong Chen, Ji Li, Yuhui Yuan•Jun 12, 2024•210

利用大型语言模型发现和优化偏好算法
Discovering Preference Optimization Algorithms with and for Large Language Models

Chris Lu, Samuel Holt, Claudio Fanconi, Alex J. Chan, Jakob Foerster, Mihaela van der Schaar, Robert Tjarko Lange•Jun 12, 2024•170

AV-DiT:用于联合音频和视频生成的高效音频-视觉扩散Transformer
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian•Jun 11, 2024•170

用于高分辨率视频生成的分层补丁扩散模型
Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Sergey Tulyakov•Jun 12, 2024•160

超越LLaVA-HD:深入研究高分辨率大型多模态模型
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Yi-Fan Zhang, Qingsong Wen, Chaoyou Fu, Xue Wang, Zhang Zhang, Liang Wang, Rong Jin•Jun 12, 2024•142

VCR:视觉字幕恢复
VCR: Visual Caption Restoration

Tianyu Zhang, Suyuchen Wang, Lu Li, Ge Zhang, Perouz Taslakian, Sai Rajeswar, Jie Fu, Bang Liu, Yoshua Bengio•Jun 10, 2024•131

通过嵌入式损坏提示进行大型语言模型遗忘
Large Language Model Unlearning via Embedding-Corrupted Prompts

Chris Yuhao Liu, Yaxuan Wang, Jeffrey Flanigan, Yang Liu•Jun 12, 2024•100

奇美拉:使用二维状态空间模型有效建模多变量时间序列
Chimera: Effectively Modeling Multivariate Time Series with 2-Dimensional State Space Models

Ali Behrouz, Michele Santacatterina, Ramin Zabih•Jun 6, 2024•101

Hibou:用于病理学的基础视觉Transformer家族
Hibou: A Family of Foundational Vision Transformers for Pathology

Dmitry Nechaev, Alexey Pchelnikov, Ekaterina Ivanova•Jun 7, 2024•91

离散数据的简化和泛化遮蔽扩散
Simplified and Generalized Masked Diffusion for Discrete Data

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias•Jun 6, 2024•70