ChatPaper.aiChatPaper.ai
首頁

arXiv

HuggingFace

定價賬戶工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究論文每日精選

每日精選AI研究論文及翻譯

NovelSeek:當智能體化身科學家——從假設到驗證的閉環系統構建
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification

NovelSeek Team, Bo Zhang, Shiyang Feng, Xiangchao Yan, Jiakang Yuan, Zhiyin Yu, Xiaohan He, Songtao Huang, Shaowei Hou, Zheng Nie, Zhilong Wang, Jinyao Liu, Runmin Ma, Tianshuo Peng, Peng Ye, Dongzhan Zhou, Shufei Zhang, Xiaosong Wang, Yilan Zhang, Meng Li, Zhongying Tu, Xiangyu Yue, Wangli Ouyang, Bowen Zhou, Lei Bai•May 22, 2025•851

規模化推理,控制力下降:評估大型推理模型中的指令遵循能力
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Tingchen Fu, Jiawei Gu, Yafu Li, Xiaoye Qu, Yu Cheng•May 20, 2025•492

Tool-Star:透過強化學習賦能具備多工具推理能力的大型語言模型
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, Ji-Rong Wen•May 22, 2025•432

像素推理者:通過好奇心驅動的強化學習激勵像素空間推理
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

Alex Su, Haozhe Wang, Weimin Ren, Fangzhen Lin, Wenhu Chen•May 21, 2025•372

KRIS-Bench:下一代智能圖像編輯模型基準測試
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu, Zonghui Li, Xinting Hu, Xinyu Ye, Xianfang Zeng, Gang Yu, Wenbo Zhu, Bernt Schiele, Ming-Hsuan Yang, Xu Yang•May 22, 2025•362

QuickVideo:系統算法協同設計實現的實時長視頻理解
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

Benjamin Schneider, Dongfu Jiang, Chao Du, Tianyu Pang, Wenhu Chen•May 22, 2025•302

GoT-R1:透過強化學習釋放多模態大語言模型的視覺生成推理能力
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

Chengqi Duan, Rongyao Fang, Yuqing Wang, Kun Wang, Linjiang Huang, Xingyu Zeng, Hongsheng Li, Xihui Liu•May 22, 2025•232

LLaDA-V:具備視覺指令微調的大型語言擴散模型
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

Zebin You, Shen Nie, Xiaolu Zhang, Jun Hu, Jun Zhou, Zhiwu Lu, Ji-Rong Wen, Chongxuan Li•May 22, 2025•223

透過μP高效擴展擴散變換器
Scaling Diffusion Transformers Efficiently via μP

Chenyu Zheng, Xinyu Zhang, Rongzhen Wang, Wei Huang, Zhi Tian, Weilin Huang, Jun Zhu, Chongxuan Li•May 21, 2025•212

基於伊藤-齋藤損失的風險規避強化學習
Risk-Averse Reinforcement Learning with Itakura-Saito Loss

Igor Udovichenko, Olivier Croissant, Anita Toleutaeva, Evgeny Burnaev, Alexander Korotin•May 22, 2025•202

理解生成式AI在日常圖像編輯任務中的能力
Understanding Generative AI Capabilities in Everyday Image Editing Tasks

Mohammad Reza Taesiri, Brandon Collins, Logan Bolton, Viet Dac Lai, Franck Dernoncourt, Trung Bui, Anh Totti Nguyen•May 22, 2025•202

AceReason-Nemotron:透過強化學習推進數學與程式碼推理
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Yang Chen, Zhuolin Yang, Zihan Liu, Chankyu Lee, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping•May 22, 2025•182

留意差距:彌合思維躍遷以改進思維鏈微調
Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning

Haolei Xu, Yuchen Yan, Yongliang Shen, Wenqi Zhang, Guiyang Hou, Shengpei Jiang, Kaitao Song, Weiming Lu, Jun Xiao, Yueting Zhuang•May 20, 2025•181

讓大型語言模型通過自我制動調校擺脫過度思考的束縛
Let LLMs Break Free from Overthinking via Self-Braking Tuning

Haoran Zhao, Yuchen Yan, Yongliang Shen, Haolei Xu, Wenqi Zhang, Kaitao Song, Jian Shao, Weiming Lu, Jun Xiao, Yueting Zhuang•May 20, 2025•182

VideoGameQA-Bench:評估視覺語言模型於電玩遊戲品質保證之效能
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj, Nabajeet Barman, Cor-Paul Bezemer•May 21, 2025•172

Dimple:具備平行解碼能力的離散擴散多模態大型語言模型
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

Runpeng Yu, Xinyin Ma, Xinchao Wang•May 22, 2025•142

多模態大語言模型微調中的無外部指導後門清理
Backdoor Cleaning without External Guidance in MLLM Fine-tuning

Xuankun Rong, Wenke Huang, Jian Liang, Jinhe Bi, Xun Xiao, Yiming Li, Bo Du, Mang Ye•May 22, 2025•142

SophiaVL-R1:以思考獎勵強化多模態大語言模型的推理能力
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Kaixuan Fan, Kaituo Feng, Haoming Lyu, Dongzhan Zhou, Xiangyu Yue•May 22, 2025•122

修復損害效能的數據:級聯大型語言模型重新標記困難負樣本以實現穩健的資訊檢索
Fixing Data That Hurts Performance: Cascading LLMs to Relabel Hard Negatives for Robust Information Retrieval

Nandan Thakur, Crystina Zhang, Xueguang Ma, Jimmy Lin•May 22, 2025•123

無需訓練的高效視頻生成:基於動態令牌雕刻技術
Training-Free Efficient Video Generation via Dynamic Token Carving

Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia•May 22, 2025•122

SpatialScore:邁向多模態空間理解的統一評估框架
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Haoning Wu, Xiao Huang, Yaohui Chen, Ya Zhang, Yanfeng Wang, Weidi Xie•May 22, 2025•102

LaViDa:面向多模態理解的大型擴散語言模型
LaViDa: A Large Diffusion Language Model for Multimodal Understanding

Shufan Li, Konstantinos Kallidromitis, Hritik Bansal, Akash Gokul, Yusuke Kato, Kazuki Kozuka, Jason Kuen, Zhe Lin, Kai-Wei Chang, Aditya Grover•May 22, 2025•102

TinyV:降低驗證中的假陰性提升大型語言模型的強化學習推理能力
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

Zhangchen Xu, Yuetai Li, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran•May 20, 2025•102

思考與否?基於強化學習的視覺語言模型選擇性推理
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models

Jiaqi Wang, Kevin Qinghong Lin, James Cheng, Mike Zheng Shou•May 22, 2025•72

WebAgent-R1:通過端到端多輪強化學習訓練網絡代理
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li•May 22, 2025•72

多模態大語言模型中的無訓練推理與反思
Training-Free Reasoning and Reflection in MLLMs

Hongchen Wei, Zhenzhong Chen•May 22, 2025•73

GRIT:教導多模態大型語言模型以圖像思考
GRIT: Teaching MLLMs to Think with Images

Yue Fan, Xuehai He, Diji Yang, Kaizhi Zheng, Ching-Chen Kuo, Yuting Zheng, Sravana Jyothi Narayanaraju, Xinze Guan, Xin Eric Wang•May 21, 2025•72

AGENTIF:大型語言模型在代理場景中的指令遵循基準測試
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Yunjia Qi, Hao Peng, Xiaozhi Wang, Amy Xin, Youfeng Liu, Bin Xu, Lei Hou, Juanzi Li•May 22, 2025•62

VLM-R^3:區域識別、推理與精煉,強化多模態思維鏈
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

Chaoya Jiang, Yongrui Heng, Wei Ye, Han Yang, Haiyang Xu, Ming Yan, Ji Zhang, Fei Huang, Shikun Zhang•May 22, 2025•62

OViP:線上視覺語言偏好學習
OViP: Online Vision-Language Preference Learning

Shujun Liu, Siyuan Wang, Zejun Li, Jianxiang Wang, Cheng Zeng, Zhongyu Wei•May 21, 2025•62

訓練步驟級推理驗證器與形式驗證工具
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Ryo Kamoi, Yusen Zhang, Nan Zhang, Sarkar Snigdha Sarathi Das, Rui Zhang•May 21, 2025•62

SafeKey:強化安全推理中的頓悟洞察
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Kaiwen Zhou, Xuandong Zhao, Gaowen Liu, Jayanth Srinivasa, Aosong Feng, Dawn Song, Xin Eric Wang•May 22, 2025•52

強化學習微調大型語言模型中的小型子網路
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur, Hao Peng•May 16, 2025•52

Think-RM:實現生成式獎勵模型中的長程推理能力
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

Ilgee Hong, Changlong Yu, Liang Qiu, Weixiang Yan, Zhenghao Xu, Haoming Jiang, Qingru Zhang, Qin Lu, Xin Liu, Chao Zhang, Tuo Zhao•May 22, 2025•42

讓仿生人夢見電子羊:一個類人化的圖像隱喻理解與推理框架
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework

Chenhao Zhang, Yazhe Niu•May 22, 2025•33

多空間MLLM:基於多模態大語言模型的多幀空間理解
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Runsen Xu, Weiyao Wang, Hao Tang, Xingyu Chen, Xiaodong Wang, Fu-Jen Chu, Dahua Lin, Matt Feiszli, Kevin J. Liang•May 22, 2025•32

Robo2VLM:基於大規模真實環境機器人操作數據集的視覺問答系統
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets

Kaiyuan Chen, Shuangyu Xie, Zehan Ma, Ken Goldberg•May 21, 2025•32

引導大型語言模型實現機器翻譯個性化
Steering Large Language Models for Machine Translation Personalization

Daniel Scalena, Gabriele Sarti, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim•May 22, 2025•22

大型語言模型何時會承認錯誤?探討模型信念在撤回中的角色
When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction

Yuqing Yang, Robin Jia•May 22, 2025•22

日期片段:時間推理中分詞處理的隱藏瓶頸
Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning

Gagan Bhatia, Maxime Peyrard, Wei Zhao•May 22, 2025•22

大型視覺語言模型如何識別圖像中的文字?揭示OCR頭部的獨特作用
How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads

Ingeol Baek, Hwan Chang, Sunghyun Ryu, Hwanhee Lee•May 21, 2025•22

RAVENEA:多模态检索增强视觉文化理解的基准
RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding

Jiaang Li, Yifei Yuan, Wenyan Li, Mohammad Aliannejadi, Daniel Hershcovich, Anders Søgaard, Ivan Vulić, Wenxuan Zhang, Paul Pu Liang, Yang Deng, Serge Belongie•May 20, 2025•22

MUG-Eval:面向任意語言的多語言生成能力代理評估框架
MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language

Seyoung Song, Seogyeong Jeong, Eunsu Kim, Jiho Jin, Dongkwan Kim, Jay Shin, Alice Oh•May 20, 2025•22

RoPECraft:基於軌跡引導RoPE優化的無訓練運動遷移於擴散變壓器
RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

Ahmet Berke Gokmen, Yigit Ekin, Bahri Batuhan Bilecen, Aysegul Dundar•May 19, 2025•22

SPhyR:材料分佈的空間物理推理基準測試
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution

Philipp D. Siedler•May 21, 2025•12

gen2seg:生成模型實現可泛化的實例分割
gen2seg: Generative Models Enable Generalizable Instance Segmentation

Om Khangaonkar, Hamed Pirsiavash•May 21, 2025•12

SAKURA:基於語音與音頻信息的大型音頻-語言模型的多跳推理研究
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information

Chih-Kai Yang, Neo Ho, Yen-Ting Piao, Hung-yi Lee•May 19, 2025•02