ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

迈向多模态通用智能之路:通用层级与通用基准
On Path to Multimodal Generalist: General-Level and General-Bench

Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng Yan, Hanwang Zhang•May 7, 2025•22

统一多模态理解与生成模型:进展、挑战与机遇
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Xinjie Zhang, Jintao Guo, Shanshan Zhao, Minghao Fu, Lunhao Duan, Guo-Hua Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang•May 5, 2025•584

零搜索:无需搜索即可激发大语言模型的检索能力
ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Hao Sun, Zile Qiao, Jiayan Guo, Xuanbo Fan, Yingyan Hou, Yong Jiang, Pengjun Xie, Fei Huang, Yan Zhang•May 7, 2025•362

HunyuanCustom:面向定制化视频生成的多模态驱动架构
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

Teng Hu, Zhentao Yu, Zhengguang Zhou, Sen Liang, Yuan Zhou, Qin Lin, Qinglin Lu•May 7, 2025•192

超越识别:评估视觉语言模型中的视觉视角理解能力
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models

Gracjan Góral, Alicja Ziarko, Piotr Miłoś, Michał Nauman, Maciej Wołczyk, Michał Kosiński•May 3, 2025•191

R&B:面向高效基础模型训练的领域重组与数据混合平衡
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training

Albert Ge, Tzu-Heng Huang, John Cooper, Avi Trost, Ziyi Chu, Satya Sai Srinath Namburi GNVV, Ziyang Cai, Kendall Park, Nicholas Roberts, Frederic Sala•May 1, 2025•171

PrimitiveAnything:基于自回归Transformer的人类手工3D基元组装生成
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer

Jingwen Ye, Yuze He, Yanning Zhou, Yiqin Zhu, Kaiwen Xiao, Yong-Jin Liu, Wei Yang, Xiao Han•May 7, 2025•151

大语言模型群体智能基准测试
Benchmarking LLMs' Swarm intelligence

Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun•May 7, 2025•130

超越定理证明:形式化问题求解的构建、框架与基准
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

Qi Liu, Xinhao Zheng, Renqiu Xia, Xingzhi Qi, Qinxiang Cao, Junchi Yan•May 7, 2025•91

OmniGIRL:面向GitHub问题解决的多语言多模态基准测试平台
OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution

Lianghong Guo, Wei Tao, Runhan Jiang, Yanlin Wang, Jiachi Chen, Xilin Liu, Yuchi Ma, Mingzhi Mao, Hongyu Zhang, Zibin Zheng•May 7, 2025•61

OpenVision:一个完全开源、经济高效的先进视觉编码器家族,面向多模态学习
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Xianhang Li, Yanqing Liu, Haoqin Tu, Hongru Zhu, Cihang Xie•May 7, 2025•61

OpenHelix:机器人操作领域的简要综述、实证分析及开源双系统VLA模型
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

Can Cui, Pengxiang Ding, Wenxuan Song, Shuanghao Bai, Xinyang Tong, Zirui Ge, Runze Suo, Wanqi Zhou, Yang Liu, Bofang Jia, Han Zhao, Siteng Huang, Donglin Wang•May 6, 2025•61

OSUniverse:多模态GUI导航AI智能体基准测试平台
OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents

Mariya Davydova, Daniel Jeffries, Patrick Barker, Arturo Márquez Flores, Sinéad Ryan•May 6, 2025•51

独立于大语言模型的自适应RAG:让问题自我表达
LLM-Independent Adaptive RAG: Let the Question Speak for Itself

Maria Marina, Nikolay Ivanov, Sergey Pletenev, Mikhail Salnikov, Daria Galimzianova, Nikita Krayko, Vasily Konovalov, Alexander Panchenko, Viktor Moskvoretskii•May 7, 2025•41

知识增强型大型语言模型在复杂问题解决中的应用:综述
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey

Da Zheng, Lun Du, Junwei Su, Yuchen Tian, Yuqi Zhu, Jintian Zhang, Lanning Wei, Ningyu Zhang, Huajun Chen•May 6, 2025•41

RAIL:面向CBCT半监督牙齿分割的区域感知指导学习
RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT

Chuyu Zhao, Hao Huang, Jiashuo Guo, Ziyu Shen, Zhongwei Zhou, Jie Liu, Zekuan Yu•May 6, 2025•21

AutoLibra:基于开放式反馈的智能体指标归纳
AutoLibra: Agent Metric Induction from Open-Ended Feedback

Hao Zhu, Phil Cuvin, Xinkai Yu, Charlotte Ka Yee Yan, Jason Zhang, Diyi Yang•May 5, 2025•22

基于不确定性加权的图像-事件多模态融合视频异常检测
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

Sungheon Jeong, Jihong Park, Mohsen Imani•May 5, 2025•21

认知涌现:人类与AI知识共创中的能动性、维度与动态机制
Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation

Xule Lin•May 6, 2025•11

COSMOS:大语言模型的可预测且经济高效的适配方案
COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Jiayu Wang, Aws Albarghouthi, Frederic Sala•Apr 30, 2025•11