ChatPaper.ai
打開菜單
首頁
每日論文
arXiv
HuggingFace
定價
賬戶
工作台
🇭🇰
繁體中文
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究論文每日精選
每日精選AI研究論文及翻譯
June 4th, 2025
反思、重试、獎勵:基於強化學習的自我改進大語言模型
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Shelly Bensal, Umar Jamil, Christopher Bryant, Melisa Russak, Kiran Kamble, Dmytro Mozolevskyi, Muayad Ali, Waseem AlShikh
•
May 30, 2025
•
168
4
UniWorld:高分辨率語義編碼器,實現統一視覺理解與生成
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Bin Lin, Zongjian Li, Xinhua Cheng, Yuwei Niu, Yang Ye, Xianyi He, Shenghai Yuan, Wangbo Yu, Shaodong Wang, Yunyang Ge, Yatian Pang, Li Yuan
•
Jun 3, 2025
•
55
2
VS-Bench:評估視覺語言模型在多智能體環境中的策略推理與決策能力
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments
Zelai Xu, Zhexuan Xu, Xiangmin Yi, Huining Yuan, Xinlei Chen, Yi Wu, Chao Yu, Yu Wang
•
Jun 3, 2025
•
55
3
SynthRL:透過可驗證的數據合成擴展視覺推理能力
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
Zijian Wu, Jinjie Ni, Xiangyan Liu, Zichen Liu, Hang Yan, Michael Qizhe Shieh
•
Jun 2, 2025
•
49
2
CSVQA:評估視覺語言模型STEM推理能力的中文多模態基準
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs
Ai Jian, Weijie Qiu, Xiaokun Wang, Peiyu Wang, Yunzhuo Hao, Jiangbo Pei, Yichen Wei, Yi Peng, Xuchen Song
•
May 30, 2025
•
47
4
GUI-Actor:面向GUI代理的無座標視覺定位系統
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Qianhui Wu, Kanzhi Cheng, Rui Yang, Chaoyun Zhang, Jianwei Yang, Huiqiang Jiang, Jian Mu, Baolin Peng, Bo Qiao, Reuben Tan, Si Qin, Lars Liden, Qingwei Lin, Huan Zhang, Tong Zhang, Jianbing Zhang, Dongmei Zhang, Jianfeng Gao
•
Jun 3, 2025
•
37
3
FinMME:金融多模態推理評估基準數據集
FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation
Junyu Luo, Zhizhuo Kou, Liming Yang, Xiao Luo, Jinsheng Huang, Zhiping Xiao, Jingshu Peng, Chengzhong Liu, Jiaming Ji, Xuanzhe Liu, Sirui Han, Ming Zhang, Yike Guo
•
May 30, 2025
•
34
3
OmniSpatial:邁向視覺語言模型的全面空間推理基準
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Mengdi Jia, Zekun Qi, Shaochen Zhang, Wenyao Zhang, Xinqiang Yu, Jiawei He, He Wang, Li Yi
•
Jun 3, 2025
•
33
2
OThink-R1:內在快/慢思維模式切換以緩解過度推理
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
Shengjia Zhang, Junjie Wu, Jiawei Chen, Changwang Zhang, Xingyu Lou, Wangchunshu Zhou, Sheng Zhou, Can Wang, Jun Wang
•
Jun 3, 2025
•
33
2
視覺具身大腦:讓多模態大型語言模型在空間中觀察、思考與控制
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Gen Luo, Ganlin Yang, Ziyang Gong, Guanzhou Chen, Haonan Duan, Erfei Cui, Ronglei Tong, Zhi Hou, Tianyi Zhang, Zhe Chen, Shenglong Ye, Lewei Lu, Jingbo Wang, Wenhai Wang, Jifeng Dai, Yu Qiao, Rongrong Ji, Xizhou Zhu
•
May 30, 2025
•
32
5
Sparse-vDiT:釋放稀疏注意力的潛力以加速視頻擴散變換器
Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers
Pengtao Chen, Xianfang Zeng, Maosen Zhao, Peng Ye, Mingzhu Shen, Wei Cheng, Gang Yu, Tao Chen
•
Jun 3, 2025
•
27
2
DINGO:擴散式大型語言模型的約束推論
DINGO: Constrained Inference for Diffusion LLMs
Tarun Suresh, Debangshu Banerjee, Shubham Ugare, Sasa Misailovic, Gagandeep Singh
•
May 29, 2025
•
26
2
Robot-R1:強化學習驅動的機器人具身推理能力提升
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Sumin Park, Huiwon Jang, Jinwoo Shin, Jaehyung Kim, Younggyo Seo
•
May 29, 2025
•
25
2
MotionSight:提升多模态大语言模型中的细粒度运动理解能力
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs
Yipeng Du, Tiehan Fan, Kepan Nan, Rui Xie, Penghao Zhou, Xiang Li, Jian Yang, Zhenheng Yang, Ying Tai
•
Jun 2, 2025
•
24
2
透過強化學習共同進化的大型語言模型編碼器與單元測試器
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
Yinjie Wang, Ling Yang, Ye Tian, Ke Shen, Mengdi Wang
•
Jun 3, 2025
•
22
2
AnimeShooter:一個用於參考引導影片生成的多鏡頭動畫數據集
AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation
Lu Qiu, Yizhuo Li, Yuying Ge, Yixiao Ge, Ying Shan, Xihui Liu
•
Jun 3, 2025
•
22
2
零樣本主體驅動生成中的負向引導主體保真度優化
Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
Chaehun Shin, Jooyoung Choi, Johan Barthelemy, Jungbeom Lee, Sungroh Yoon
•
Jun 4, 2025
•
21
2
LumosFlow:運動引導的長視頻生成
LumosFlow: Motion-Guided Long Video Generation
Jiahao Chen, Hangjie Yuan, Yichen Qian, Jingyun Liang, Jiazheng Xing, Pengwei Liu, Weihua Chen, Fan Wang, Bing Su
•
Jun 3, 2025
•
18
2
原生解析度影像合成
Native-Resolution Image Synthesis
Zidong Wang, Lei Bai, Xiangyu Yue, Wanli Ouyang, Yiyuan Zhang
•
Jun 3, 2025
•
17
3
RelationAdapter:基於擴散變換器的視覺關係學習與遷移
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Yan Gong, Yiren Song, Yicheng Li, Chenglin Li, Yin Zhang
•
Jun 3, 2025
•
15
2
DCM:雙專家一致性模型——高效高質視頻生成之道
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu
•
Jun 3, 2025
•
14
2
FlowMo:基於方差的流動引導技術,用於視頻生成中的連貫運動
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation
Ariel Shaulov, Itay Hazan, Lior Wolf, Hila Chefer
•
Jun 1, 2025
•
14
2
數據表單不足以確保品質:數據評量框架實現自動化質量指標與問責機制
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability
Genta Indra Winata, David Anugraha, Emmy Liu, Alham Fikri Aji, Shou-Yi Hung, Aditya Parashar, Patrick Amadeus Irawan, Ruochen Zhang, Zheng-Xin Yong, Jan Christian Blaise Cruz, Niklas Muennighoff, Seungone Kim, Hanyang Zhao, Sudipta Kar, Kezia Erina Suryoraharjo, M. Farid Adilazuarda, En-Shiun Annie Lee, Ayu Purwarianti, Derry Tanti Wijaya, Monojit Choudhury
•
Jun 2, 2025
•
12
2
PCoreSet:通過視覺-語言模型知識蒸餾實現高效主動學習
PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models
Seongjae Kang, Dong Bok Lee, Hyungjoon Jang, Dongseop Kim, Sung Ju Hwang
•
Jun 1, 2025
•
10
3
Ctrl-Crash:可控擴散技術實現逼真車禍場景
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes
Anthony Gosselin, Ge Ya Luo, Luis Lara, Florian Golemo, Derek Nowrouzezahrai, Liam Paull, Alexia Jolicoeur-Martineau, Christopher Pal
•
May 30, 2025
•
10
3
利用程序分析反饋訓練語言模型生成高質量代碼
Training Language Models to Generate Quality Code with Program Analysis Feedback
Feng Yao, Zilong Wang, Liyuan Liu, Junxia Cui, Li Zhong, Xiaohan Fu, Haohui Mai, Vish Krishnan, Jianfeng Gao, Jingbo Shang
•
May 28, 2025
•
9
4
自我挑战式语言模型代理
Self-Challenging Language Model Agents
Yifei Zhou, Sergey Levine, Jason Weston, Xian Li, Sainbayar Sukhbaatar
•
Jun 2, 2025
•
8
2
動作感知概念對齊以實現一致的視頻編輯
Motion-Aware Concept Alignment for Consistent Video Editing
Tong Zhang, Juan C Leon Alcazar, Bernard Ghanem
•
Jun 1, 2025
•
7
2
ORV:以四維佔用為中心的機器人視頻生成
ORV: 4D Occupancy-centric Robot Video Generation
Xiuyu Yang, Bohan Li, Shaocong Xu, Nan Wang, Chongjie Ye, Zhaoxi Chen, Minghan Qin, Yikang Ding, Xin Jin, Hang Zhao, Hao Zhao
•
Jun 3, 2025
•
6
2
透過自適應平行解碼加速擴散式大型語言模型
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
Daniel Israel, Guy Van den Broeck, Aditya Grover
•
May 31, 2025
•
6
2
MERIT:基於交錯多條件查詢的多語言語義檢索
MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query
Wei Chow, Yuan Gao, Linfeng Li, Xian Wang, Qi Xu, Hang Song, Lingdong Kong, Ran Zhou, Yi Zeng, Yidong Cai, Botian Jiang, Shilin Xu, Jiajun Zhang, Minghui Qiu, Xiangtai Li, Tianshu Yang, Siliang Tang, Juncheng Li
•
Jun 3, 2025
•
3
2
FuseLIP:基於早期離散標記融合的多模態嵌入
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
Christian Schlarmann, Francesco Croce, Nicolas Flammarion, Matthias Hein
•
Jun 3, 2025
•
3
2
多模態深度研究員:基於代理框架從零生成圖文交織的研究報告
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
Zhaorui Yang, Bo Pan, Han Wang, Yiyao Wang, Xingyu Liu, Minfeng Zhu, Bo Zhang, Wei Chen
•
Jun 3, 2025
•
3
2
開源推理模型缺失的一環:一個用於緩解冷啟動短鏈思維語言模型在強化學習中困境的數據集
One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL
Hyungjoo Chae, Dongjin Kang, Jihyuk Kim, Beong-woo Kwak, Sunghyun Park, Haeju Park, Jinyoung Yeo, Moontae Lee, Kyungjae Lee
•
Jun 3, 2025
•
3
2
角度不說謊:透過模型自身訊號解鎖訓練高效的強化學習
Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals
Qinsi Wang, Jinghan Ke, Hancheng Ye, Yueqian Lin, Yuzhe Fu, Jianyi Zhang, Kurt Keutzer, Chenfeng Xu, Yiran Chen
•
Jun 2, 2025
•
3
2
汉服基准:跨时代文化理解与再创作的多模态基准测试
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation
Li Zhou, Lutong Yu, Dongchu Xie, Shaohuan Cheng, Wenyan Li, Haizhou Li
•
Jun 2, 2025
•
3
2
ReFoCUS:基於強化學習的框架優化,用於上下文理解
ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding
Hosu Lee, Junho Kim, Hyunjun Kim, Yong Man Ro
•
Jun 2, 2025
•
3
2
SHARE:一種基於SLM的層次化動作校正助手,用於文本到SQL轉換
SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL
Ge Qu, Jinyang Li, Bowen Qin, Xiaolong Li, Nan Huo, Chenhao Ma, Reynold Cheng
•
May 31, 2025
•
3
2
回溯多少才足夠?探索監督式微調與強化學習在提升大語言模型推理能力中的交互作用
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning
Hongyi James Cai, Junlin Wang, Xiaoyin Chen, Bhuwan Dhingra
•
May 30, 2025
•
3
2
深度视频探索:基于工具使用的长视频理解代理搜索
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo, Jiahao Li, Bin Li, Houqiang Li, Yan Lu
•
May 23, 2025
•
3
2
基於生成先驗的可控人體關鍵幀插值
Controllable Human-centric Keyframe Interpolation with Generative Prior
Zujin Guo, Size Wu, Zhongang Cai, Wei Li, Chen Change Loy
•
Jun 3, 2025
•
2
2
TL;DR:太長了,重新加權以實現高效的大型語言模型推理壓縮
TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression
Zhong-Zhi Li, Xiao Liang, Zihao Tang, Lei Ji, Peijie Wang, Haotian Xu, Xing W, Haizhen Huang, Weiwei Deng, Ying Nian Wu, Yeyun Gong, Zhijiang Guo, Xiao Liu, Fei Yin, Cheng-Lin Liu
•
Jun 3, 2025
•
2
2
M^3FinMeeting:一個多語言、多領域、多任務的財務會議理解評估數據集
M^3FinMeeting: A Multilingual, Multi-Sector, and Multi-Task Financial Meeting Understanding Evaluation Dataset
Jie Zhu, Junhui Li, Yalong Wen, Xiandong Li, Lifan Guo, Feng Chen
•
Jun 3, 2025
•
2
2
QARI-OCR:通過多模態大型語言模型適應實現高保真阿拉伯文文本識別
QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation
Ahmed Wasfy, Omer Nacar, Abdelakreem Elkhateb, Mahmoud Reda, Omar Elshehy, Adel Ammar, Wadii Boulila
•
Jun 2, 2025
•
2
2
Control-R:邁向可控的測試時縮放
Control-R: Towards controllable test-time scaling
Di Zhang, Weida Wang, Junxian Li, Xunzhi Wang, Jiatong Li, Jianbo Wu, Jingdi Lei, Haonan He, Peng Ye, Shufei Zhang, Wanli Ouyang, Yuqiang Li, Dongzhan Zhou
•
May 30, 2025
•
2
2
R^2ec:迈向具备推理能力的大型推荐模型
R^2ec: Towards Large Recommender Models with Reasoning
Runyang You, Yongqi Li, Xinyu Lin, Xin Zhang, Wenjie Wang, Wenjie Li, Liqiang Nie
•
May 22, 2025
•
2
2
ByteMorph:基於非剛性運動的指令引導圖像編輯基準測試
ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
Di Chang, Mingdeng Cao, Yichun Shi, Bo Liu, Shengqu Cai, Shijie Zhou, Weilin Huang, Gordon Wetzstein, Mohammad Soleymani, Peng Wang
•
Jun 3, 2025
•
1
2
重探LRP:位置歸因作為Transformer可解釋性的缺失要素
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
Yarden Bakish, Itamar Zimerman, Hila Chefer, Lior Wolf
•
Jun 2, 2025
•
1
3
超越上下文學習:基於任務固有屬性指南的大型語言模型長篇生成對齊
Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines
Do Xuan Long, Duong Ngoc Yen, Do Xuan Trong, Luu Anh Tuan, Kenji Kawaguchi, Shafiq Joty, Min-Yen Kan, Nancy F. Chen
•
Jun 2, 2025
•
1
2
言前先知:大語言模型表徵在完成前即編碼思維鏈成功信息
Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion
Anum Afzal, Florian Matthes, Gal Chechik, Yftah Ziser
•
May 30, 2025
•
1
2