ChatPaper.aiChatPaper.ai
Home

arXiv

HuggingFace

PricingAccountWorkSpace

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI Research Papers Daily

Daily curated AI research papers with translations

On Path to Multimodal Generalist: General-Level and General-Bench

Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng Yan, Hanwang Zhang•May 7, 2025•22

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Xinjie Zhang, Jintao Guo, Shanshan Zhao, Minghao Fu, Lunhao Duan, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang•May 5, 2025•684

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang•May 7, 2025•608

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Teng Hu, Zhentao Yu, Zhengguang Zhou, Sen Liang, Yuan Zhou, Qin Lin, Qinglin Lu•May 7, 2025•333

PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

Jingwen Ye, Yuze He, Yanning Zhou, Yiqin Zhu, Kaiwen Xiao, Yong-Jin Liu, Wei Yang, Xiao Han•May 7, 2025•251

Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models

Gracjan Góral, Alicja Ziarko, Piotr Miłoś, Michał Nauman, Maciej Wołczyk, Michał Kosiński•May 3, 2025•221

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Xianhang Li, Yanqing Liu, Haoqin Tu, Hongru Zhu, Cihang Xie•May 7, 2025•211

R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi GNVV, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala•May 1, 2025•201

Benchmarking LLMs' Swarm intelligence

Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun•May 7, 2025•180

LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii•May 7, 2025•111

Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

Qi Liu, Xinhao Zheng, Renqiu Xia, Xingzhi Qi, Qinxiang Cao, Junchi Yan•May 7, 2025•101

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

Can Cui, Pengxiang Ding, Wenxuan Song, Shuanghao Bai, Xinyang Tong, Zirui Ge, Runze Suo, Wanqi Zhou, Yang Liu, Bofang Jia, Han Zhao, Siteng Huang, Donglin Wang•May 6, 2025•81

Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey

Da Zheng, Lun Du, Junwei Su, Yuchen Tian, Yuqi Zhu, Jintian Zhang, Lanning Wei, Ningyu Zhang, Huajun Chen•May 6, 2025•81

OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents

Mariya Davydova, Daniel Jeffries, Patrick Barker, Arturo Márquez Flores, Sinéad Ryan•May 6, 2025•71

OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution

Lianghong Guo, Wei Tao, Runhan Jiang, Yanlin Wang, Jiachi Chen, Xilin Liu, Yuchi Ma, Mingzhi Mao, Hongyu Zhang, Zibin Zheng•May 7, 2025•61

AutoLibra: Agent Metric Induction from Open-Ended Feedback

Hao Zhu, Phil Cuvin, Xinkai Yu, Charlotte Ka Yee Yan, Jason Zhang, Diyi Yang•May 5, 2025•32

Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

Sungheon Jeong, Jihong Park, Mohsen Imani•May 5, 2025•31

COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Jiayu Wang, Aws Albarghouthi, Frederic Sala•Apr 30, 2025•31

RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT

Chuyu Zhao, Hao Huang, Jiashuo Guo, Ziyu Shen, Zhongwei Zhou, Jie Liu, Zekuan Yu•May 6, 2025•21

Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation

Xule Lin•May 6, 2025•11