ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

MiMo-VL技术报告
MiMo-VL Technical Report

Xiaomi LLM-Core Team, Zihao Yue, Zhenru Lin, Yifan Song, Weikun Wang, Shuhuai Ren, Shuhao Gu, Shicheng Li, Peidian Li, Liang Zhao, Lei Li, Kainan Bao, Hao Tian, Hailin Zhang, Gang Wang, Dawei Zhu, Cici, Chenhong He, Bowen Ye, Bowen Shen, Zihan Zhang, Zihan Jiang, Zhixian Zheng, Zhichao Song, Zhenbo Luo, Yue Yu, Yudong Wang, Yuanyuan Tian, Yu Tu, Yihan Yan, Yi Huang, Xu Wang, Xinzhe Xu, Xingchen Song, Xing Zhang, Xing Yong, Xin Zhang, Xiangwei Deng, Wenyu Yang, Wenhan Ma, Weiwei Lv, Weiji Zhuang, Wei Liu, Sirui Deng, Shuo Liu, Shimao Chen, Shihua Yu, Shaohui Liu, Shande Wang, Rui Ma, Qiantong Wang, Peng Wang, Nuo Chen, Menghang Zhu, Kangyang Zhou, Kang Zhou, Kai Fang, Jun Shi, Jinhao Dong, Jiebao Xiao, Jiaming Xu, Huaqiu Liu, Hongshen Xu, Heng Qu, Haochen Zhao, Hanglong Lv, Guoan Wang, Duo Zhang, Dong Zhang, Di Zhang, Chong Ma, Chang Liu, Can Cai, Bingquan Xia•Jun 4, 2025•632

AmbiK:厨房环境下的模糊任务数据集
AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment

Anastasiia Ivanova, Eva Bakaeva, Zoya Volovikova, Alexey K. Kovalev, Aleksandr I. Panov•Jun 4, 2025•432

推进多模态推理:从优化冷启动到分阶段强化学习
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Shuang Chen, Yue Guo, Zhaochen Su, Yafu Li, Yulun Wu, Jiacheng Chen, Jiayu Chen, Weijie Wang, Xiaoye Qu, Yu Cheng•Jun 4, 2025•414

长上下文语言模型的可控性检验
A Controllable Examination for Long-Context Language Models

Yijun Yang, Zeyu Huang, Wenhao Zhu, Zihan Qiu, Fei Yuan, Jeff Z. Pan, Ivan Titov•Jun 3, 2025•302

MMR-V:未言之谜?视频多模态深度推理基准测试
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos

Kejian Zhu, Zhuoran Jin, Hongbang Yuan, Jiachun Li, Shangqing Tu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao•Jun 4, 2025•282

超级写手:基于大语言模型的反思驱动长文本生成
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Yuhao Wu, Yushi Bai, Zhiqiang Hu, Juanzi Li, Roy Ka-Wei Lee•Jun 4, 2025•262

开放思维:推理模型的数据配方
OpenThoughts: Data Recipes for Reasoning Models

Etash Guha, Ryan Marten, Sedrick Keh, Negin Raoof, Georgios Smyrnis, Hritik Bansal, Marianna Nezhurina, Jean Mercat, Trung Vu, Zayne Sprague, Ashima Suvarna, Benjamin Feuer, Liangyu Chen, Zaid Khan, Eric Frankel, Sachin Grover, Caroline Choi, Niklas Muennighoff, Shiye Su, Wanjia Zhao, John Yang, Shreyas Pimpalgaonkar, Kartik Sharma, Charlie Cheng-Jie Ji, Yichuan Deng, Sarah Pratt, Vivek Ramanujan, Jon Saad-Falcon, Jeffrey Li, Achal Dave, Alon Albalak, Kushal Arora, Blake Wulfe, Chinmay Hegde, Greg Durrett, Sewoong Oh, Mohit Bansal, Saadia Gabriel, Aditya Grover, Kai-Wei Chang, Vaishaal Shankar, Aaron Gokaslan, Mike A. Merrill, Tatsunori Hashimoto, Yejin Choi, Jenia Jitsev, Reinhard Heckel, Maheswaran Sathiamoorthy, Alexandros G. Dimakis, Ludwig Schmidt•Jun 4, 2025•242

通过捷径神经元分析建立可信赖的大语言模型评估体系
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis

Kejian Zhu, Shangqing Tu, Zhuoran Jin, Lei Hou, Juanzi Li, Jun Zhao•Jun 4, 2025•242

Voyager:面向可探索3D场景生成的长程与全局一致视频扩散模型
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Tianyu Huang, Wangguandong Zheng, Tengfei Wang, Yuhao Liu, Zhenwei Wang, Junta Wu, Jie Jiang, Hui Li, Rynson W. H. Lau, Wangmeng Zuo, Chunchao Guo•Jun 4, 2025•212

VisCoder:微调大语言模型以生成可执行的Python可视化代码
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen•Jun 4, 2025•202

IllumiCraft:统一几何与光照扩散的可控视频生成框架
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Ronald Clark, Ming-Hsuan Yang•Jun 3, 2025•203

基于扩散模型的图像编辑程序化实现
Image Editing As Programs with Diffusion Models

Yujia Hu, Songhua Liu, Zhenxiong Tan, Xingyi Yang, Xinchao Wang•Jun 4, 2025•162

通过单问题批判性微调释放预训练大语言模型的推理潜能
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen•Jun 3, 2025•162

Ψ-采样器:基于序贯蒙特卡罗(SMC)推理的分数模型奖励对齐初始粒子采样
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Taehoon Yoon, Yunhong Min, Kyeongmin Yeo, Minhyuk Sung•Jun 2, 2025•162

LayerFlow:面向层级感知视频生成的统一模型
LayerFlow: A Unified Model for Layer-aware Video Generation

Sihui Ji, Hao Luo, Xi Chen, Yuanpeng Tu, Yiyang Wang, Hengshuang Zhao•Jun 4, 2025•132

DenseDPO:面向视频扩散模型的细粒度时序偏好优化
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models

Ziyi Wu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ashkan Mirzaei, Igor Gilitschenski, Sergey Tulyakov, Aliaksandr Siarohin•Jun 4, 2025•132

SVGenius:大语言模型在SVG理解、编辑与生成领域的基准测试
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation

Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang•Jun 3, 2025•132

TimeHC-RL:时序感知的层次认知强化学习 ——提升大语言模型社交智能的新方法
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence

Guiyang Hou, Xing Gao, Yuchuan Wu, Xiang Huang, Wenqi Zhang, Zhe Zheng, Yongliang Shen, Jialu Du, Fei Huang, Yongbin Li, Weiming Lu•May 30, 2025•112

修正稀疏注意力
Rectified Sparse Attention

Yutao Sun, Tianzhu Ye, Li Dong, Yuqing Xia, Jian Chen, Yizhao Gao, Shijie Cao, Jianyong Wang, Furu Wei•Jun 4, 2025•92

Orak:一个用于训练和评估LLM代理在多样化视频游戏上的基础基准
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya S. Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho•Jun 4, 2025•92

超越表象:衡量大语言模型判断中的自我偏好
Beyond the Surface: Measuring Self-Preference in LLM Judgments

Zhi-Yuan Chen, Hao Wang, Xinyu Zhang, Enrui Hu, Yankai Lin•Jun 3, 2025•82

BenchHub:一个统一且可定制的LLM全方位评估基准套件
BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation

Eunsu Kim, Haneul Yoo, Guijin Son, Hitesh Patel, Amit Agarwal, Alice Oh•May 31, 2025•82

TalkingMachines:基于自回归扩散模型的实时音频驱动FaceTime风格视频生成
TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models

Chetwin Low, Weimin Wang•Jun 3, 2025•72

DiffDecompose:基于扩散Transformer的Alpha合成图像逐层分解
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers

Zitong Wang, Hang Zhao, Qianyu Zhou, Xuequan Lu, Xiangtai Li, Yiren Song•May 24, 2025•72

POSS:位置专家为推测性解码生成更优草稿
POSS: Position Specialist Generates Better Draft for Speculative Decoding

Langlin Huang, Chengsong Huang, Jixuan Leng, Di Huang, Jiaxin Huang•Jun 4, 2025•62

跨域鲁棒性:CLIP需配备鲁棒的文本编码器
Robustness in Both Domains: CLIP Needs a Robust Text Encoder

Elias Abad Rocamora, Christian Schlarmann, Naman Deep Singh, Yongtao Wu, Matthias Hein, Volkan Cevher•Jun 3, 2025•62

Critique-GRPO:通过自然语言与数值反馈提升大语言模型推理能力
Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Xiaoying Zhang, Hao Sun, Yipeng Zhang, Kaituo Feng, Chaochao Lu, Chao Yang, Helen Meng•Jun 3, 2025•62

在持续学习之前进行适应
Adapt before Continual Learning

Aojun Lu, Tao Feng, Hangjie Yuan, Chunhui Ding, Yanan Sun•Jun 4, 2025•52

Video-Skill-CoT:面向领域自适应视频推理的技能链式思维
Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning

Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal•Jun 4, 2025•52

CapSpeech:赋能风格化字幕文本转语音的下游应用
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech

Helin Wang, Jiarui Hai, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak•Jun 3, 2025•53

RefEdit:基于指代表达的指令图像编辑模型改进基准与方法
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions

Bimsara Pathiraja, Maitreya Patel, Shivam Singh, Yezhou Yang, Chitta Baral•Jun 3, 2025•42

量化大语言模型评估
Quantitative LLM Judges

Aishwarya Sahoo, Jeevana Kruthi Karnuthala, Tushar Parmanand Budhwani, Pranchal Agarwal, Sankaran Vaidyanathan, Alexa Siu, Franck Dernoncourt, Jennifer Healey, Nedim Lipka, Ryan Rossi, Uttaran Bhattacharya, Branislav Kveton•Jun 3, 2025•42

通过置信度引导的数据增强改善未知协变量偏移下的知识蒸馏
Improving Knowledge Distillation Under Unknown Covariate Shift Through Confidence-Guided Data Augmentation

Niclas Popp, Kevin Alexander Laube, Matthias Hein, Lukas Schott•Jun 2, 2025•42

循流溯源:基于神经符号代理的细粒度流程图归因
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents

Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Vivek Gupta, Dinesh Manocha•Jun 2, 2025•42

DLP:大型语言模型中的动态分层剪枝
DLP: Dynamic Layerwise Pruning in Large Language Models

Yuli Chen, Bo Cheng, Jiale Han, Yingying Zhang, Yingting Li, Shuhao Zhang•May 27, 2025•42

释放小时级视频训练潜力,助力长视频语言理解
Unleashing Hour-Scale Video Training for Long Video-Language Understanding

Jingyang Lin, Jialian Wu, Ximeng Sun, Ze Wang, Jiang Liu, Yusheng Su, Xiaodong Yu, Hao Chen, Jiebo Luo, Zicheng Liu, Emad Barsoum•Jun 5, 2025•31

面向代理式AI的TRiSM框架:基于大语言模型的代理多智能体系统中信任、风险与安全管理综述
TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis•Jun 4, 2025•32

HTSC-2025:面向AI驱动临界温度预测的常压高温超导体基准数据集
HTSC-2025: A Benchmark Dataset of Ambient-Pressure High-Temperature Superconductors for AI-Driven Critical Temperature Prediction

Xiao-Qi Han, Ze-Feng Gao, Xin-De Wang, Zhenfeng Ouyang, Peng-Jie Guo, Zhong-Yi Lu•Jun 4, 2025•32

分段策略优化:面向大语言模型强化学习的有效分段级信用分配
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

Yiran Guo, Lijie Xu, Jie Liu, Dan Ye, Shuang Qiu•May 29, 2025•32

Rex-Thinker:基于思维链推理的物体指代系统
Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning

Qing Jiang, Xingyu Chen, Zhaoyang Zeng, Junzhi Yu, Lei Zhang•Jun 4, 2025•22

从架构视角重新思考持续学习中的稳定性-可塑性权衡
Rethinking the Stability-Plasticity Trade-off in Continual Learning from an Architectural Perspective

Aojun Lu, Hangjie Yuan, Tao Feng, Yanan Sun•Jun 4, 2025•22

CRAWLDoc:一个用于文献文档鲁棒排序的数据集
CRAWLDoc: A Dataset for Robust Ranking of Bibliographic Documents

Fabian Karl, Ansgar Scherp•Jun 4, 2025•22

视觉语言模型能够整合分散的训练片段
VLMs Can Aggregate Scattered Training Patches

Zhanhui Zhou, Lingjie Chen, Chao Yang, Chaochao Lu•Jun 4, 2025•22

在野外环境中实现鲁棒神经渲染:基于非对称双3D高斯溅射的方法
Robust Neural Rendering in the Wild with Asymmetric Dual 3D Gaussian Splatting

Chengqi Li, Zhihao Shi, Yangdi Lu, Wenbo He, Xiangyu Xu•Jun 4, 2025•22

利用FLAIR解决逆问题
Solving Inverse Problems with FLAIR

Julius Erbach, Dominik Narnhofer, Andreas Dombos, Bernt Schiele, Jan Eric Lenssen, Konrad Schindler•Jun 3, 2025•22

FinChain:一个可验证链式思维金融推理的符号化基准
FinChain: A Symbolic Benchmark for Verifiable Chain-of-Thought Financial Reasoning

Zhuohan Xie, Dhruv Sahnan, Debopriyo Banerjee, Georgi Georgiev, Rushil Thareja, Hachem Madmoun, Jinyan Su, Aaryamonvikram Singh, Yuxia Wang, Rui Xing, Fajri Koto, Haonan Li, Ivan Koychev, Tanmoy Chakraborty, Salem Lahlou, Veselin Stoyanov, Preslav Nakov•Jun 3, 2025•22

小型语言模型是智能体AI的未来发展方向。
Small Language Models are the Future of Agentic AI

Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralidharan, Yingyan Celine Lin, Pavlo Molchanov•Jun 2, 2025•22

物体之声:交互式物体感知图像到音频生成
Sounding that Object: Interactive Object-Aware Image to Audio Generation

Tingle Li, Baihe Huang, Xiaobin Zhuang, Dongya Jia, Jiawei Chen, Yuping Wang, Zhuo Chen, Gopala Anumanchipalli, Yuxuan Wang•Jun 4, 2025•12

主动学习超参数调研:基于大规模实验网格的深入洞察
Survey of Active Learning Hyperparameters: Insights from a Large-Scale Experimental Grid

Julius Gonsior, Tim Rieß, Anja Reusch, Claudio Hartmann, Maik Thiele, Wolfgang Lehner•Jun 4, 2025•12

RiOSWorld:多模态计算机使用代理的风险基准测试
RiOSWorld: Benchmarking the Risk of Multimodal Compter-Use Agents

Jingyi Yang, Shuai Shao, Dongrui Liu, Jing Shao•May 31, 2025•12