ChatPaper.aiChatPaper.ai
Home

arXiv

HuggingFace

PricingAccountWorkSpace

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI Research Papers Daily

Daily curated AI research papers with translations

On Path to Multimodal Generalist: General-Level and General-Bench

Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng Yan, Hanwang Zhang•May 7, 2025•22

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Xinjie Zhang, Jintao Guo, Shanshan Zhao, Minghao Fu, Lunhao Duan, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang•May 5, 2025•584

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang•May 7, 2025•372

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Teng Hu, Zhentao Yu, Zhengguang Zhou, Sen Liang, Yuan Zhou, Qin Lin, Qinglin Lu•May 7, 2025•202

Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models

Gracjan Góral, Alicja Ziarko, Piotr Miłoś, Michał Nauman, Maciej Wołczyk, Michał Kosiński•May 3, 2025•201

PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

Jingwen Ye, Yuze He, Yanning Zhou, Yiqin Zhu, Kaiwen Xiao, Yong-Jin Liu, Wei Yang, Xiao Han•May 7, 2025•171

R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi GNVV, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala•May 1, 2025•171

Benchmarking LLMs' Swarm intelligence

Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun•May 7, 2025•130

Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

Qi Liu, Xinhao Zheng, Renqiu Xia, Xingzhi Qi, Qinxiang Cao, Junchi Yan•May 7, 2025•101

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Xianhang Li, Yanqing Liu, Haoqin Tu, Hongru Zhu, Cihang Xie•May 7, 2025•81

OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution

Lianghong Guo, Wei Tao, Runhan Jiang, Yanlin Wang, Jiachi Chen, Xilin Liu, Yuchi Ma, Mingzhi Mao, Hongyu Zhang, Zibin Zheng•May 7, 2025•61

OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

Can Cui, Pengxiang Ding, Wenxuan Song, Shuanghao Bai, Xinyang Tong, Zirui Ge, Runze Suo, Wanqi Zhou, Yang Liu, Bofang Jia, Han Zhao, Siteng Huang, Donglin Wang•May 6, 2025•61

OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents

Mariya Davydova, Daniel Jeffries, Patrick Barker, Arturo Márquez Flores, Sinéad Ryan•May 6, 2025•51

LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii•May 7, 2025•41

Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey

Da Zheng, Lun Du, Junwei Su, Yuchen Tian, Yuqi Zhu, Jintian Zhang, Lanning Wei, Ningyu Zhang, Huajun Chen•May 6, 2025•41

RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT

Chuyu Zhao, Hao Huang, Jiashuo Guo, Ziyu Shen, Zhongwei Zhou, Jie Liu, Zekuan Yu•May 6, 2025•21

AutoLibra: Agent Metric Induction from Open-Ended Feedback

Hao Zhu, Phil Cuvin, Xinkai Yu, Charlotte Ka Yee Yan, Jason Zhang, Diyi Yang•May 5, 2025•22

Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

Sungheon Jeong, Jihong Park, Mohsen Imani•May 5, 2025•21

Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation

Xule Lin•May 6, 2025•11

COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Jiayu Wang, Aws Albarghouthi, Frederic Sala•Apr 30, 2025•11