ChatPaper.aiChatPaper.ai
首頁

arXiv

HuggingFace

定價賬戶工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究論文每日精選

每日精選AI研究論文及翻譯

科學委員會:評估多模態自主代理於現實科學工作流程中的表現
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu•May 26, 2025•962

Paper2Poster:迈向从科学论文自动生成多模态海报
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr•May 27, 2025•821

MME-Reasoning:多模态大语言模型逻辑推理综合基准测试
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue•May 27, 2025•783

OmniConsistency:從配對風格化數據中學習風格無關的一致性
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Yiren Song, Cheng Liu, Mike Zheng Shou•May 24, 2025•622

SynLogic:大規模合成可驗證的推理數據,用於邏輯推理及其他領域的學習
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Junteng Liu, Yuanxiang Fan, Zhuo Jiang, Han Ding, Yongyi Hu, Chi Zhang, Yiqi Shi, Shitong Weng, Aili Chen, Shiqi Chen, Yunan Huang, Mozhi Zhang, Pengyu Zhao, Junjie Yan, Junxian He•May 26, 2025•572

探索大型語言模型在一步式文本生成中的潛在能力
Exploring the Latent Capacity of LLMs for One-Step Text Generation

Gleb Mezentsev, Ivan Oseledets•May 27, 2025•561

OpenS2V-Nexus:一個詳盡的基準與百萬規模的主題至影片生成資料集
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng, Yang Ye, Jinfa Huang, Bin Lin, Chongyang Ma, Jiebo Luo, Li Yuan•May 26, 2025•523

勿需過度思考。選擇更短的思維鏈以提升大型語言模型的推理能力
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz•May 23, 2025•494

MMMR:大規模多模態推理任務基準測試
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

Guiyao Tie, Xueyang Zhou, Tianhe Gu, Ruihang Zhang, Chaoran Hu, Sizhe Zhang, Mengqu Sun, Yan Zhang, Pan Zhou, Lichao Sun•May 22, 2025•444

直觉引导:基于强化内在置信度的高效测试时扩展
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence

Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, Di Niu•May 23, 2025•422

VerIPO:透過驗證器引導的迭代策略優化培養視頻大語言模型的長程推理能力
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

Yunxin Li, Xinyu Chen, Zitao Li, Zhenyu Liu, Longyue Wang, Wenhan Luo, Baotian Hu, Min Zhang•May 25, 2025•385

Sparse VideoGen2:透過語義感知排列的稀疏注意力加速視頻生成
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica•May 24, 2025•372

MME-VideoOCR:評估多模態大語言模型在視頻場景中的OCR能力
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Yang Shi, Huanqian Wang, Wulin Xie, Huanyao Zhang, Lijie Zhao, Yi-Fan Zhang, Xinfeng Li, Chaoyou Fu, Zhuoer Wen, Wenting Liu, Zhuoran Zhang, Xinlong Chen, Bohan Zeng, Sihan Yang, Yuanxing Zhang, Pengfei Wan, Haotian Wang, Wenjing Yang•May 27, 2025•361

UI-Genie:一種自我提升方法,用於迭代增強基於MLLM的行動GUI代理
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai, Zimu Lu, Weifeng Lin, Hao He, Lue Fan, Liuyang Bian, Rui Hu, Liang Liu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Aojun Zhou, Hongsheng Li•May 27, 2025•351

GraLoRA:面向参数高效微调的细粒度低秩适配
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park•May 26, 2025•332

Video-Holmes:多模态大语言模型能否像福尔摩斯一样进行复杂视频推理?
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Junhao Cheng, Yuying Ge, Teng Wang, Yixiao Ge, Jing Liao, Ying Shan•May 27, 2025•272

SweEval:大型語言模型真的會說髒話嗎?企業應用安全測試的極限基準
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

Hitesh Laxmichand Patel, Amit Agarwal, Arion Das, Bhargava Kumar, Srikant Panda, Priyaranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Dong-Kyu Chae•May 22, 2025•273

強化無需驗證的通用推理能力
Reinforcing General Reasoning without Verifiers

Xiangxin Zhou, Zichen Liu, Anya Sims, Haonan Wang, Tianyu Pang, Chongxuan Li, Liang Wang, Min Lin, Chao Du•May 27, 2025•252

rStar-Coder:基於大規模驗證數據集的競爭性代碼推理擴展
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

Yifei Liu, Li Lyna Zhang, Yi Zhu, Bingcheng Dong, Xudong Zhou, Ning Shang, Fan Yang, Mao Yang•May 27, 2025•254

MetaMind:運用元認知多智能體系統模擬人類社會思維
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems

Xuanming Zhang, Yuxuan Chen, Min-Hsuan Yeh, Yixuan Li•May 25, 2025•244

代碼圖模型(CGM):一種圖集成大型語言模型,用於倉庫級軟體工程任務
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

Hongyuan Tao, Ying Zhang, Zhenhao Tang, Hongen Peng, Xukun Zhu, Bingchang Liu, Yingguang Yang, Ziyin Zhang, Zhaogui Xu, Haipeng Zhang, Linchao Zhu, Rui Wang, Hang Yu, Jianguo Li, Peng Di•May 22, 2025•192

HoliTom:面向快速视频大语言模型的全息令牌融合技术
HoliTom: Holistic Token Merging for Fast Video Large Language Models

Kele Shao, Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang•May 27, 2025•182

MotionPro:圖像至視頻生成中的精確運動控制器
MotionPro: A Precise Motion Controller for Image-to-Video Generation

Zhongwei Zhang, Fuchen Long, Zhaofan Qiu, Yingwei Pan, Wu Liu, Ting Yao, Tao Mei•May 26, 2025•183

超越蒸餾:以極簡規則強化學習突破醫學大語言模型推理的極限
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL

Che Liu, Haozhe Wang, Jiazhen Pan, Zhongwei Wan, Yong Dai, Fangzhen Lin, Wenjia Bai, Daniel Rueckert, Rossella Arcucci•May 23, 2025•182

對齊如何提升大型語言模型的多語言能力?從語言神經元視角探討
How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective

Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She, Xiao Liu, Yeyun Gong, Shujian Huang, Jiajun Chen•May 27, 2025•172

NOVA:腦部MRI異常定位與臨床推理的基準測試
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI

Cosmin I. Bercea, Jun Li, Philipp Raffler, Evamaria O. Riedel, Lena Schmitzer, Angela Kurz, Felix Bitzer, Paula Roßmüller, Julian Canisius, Mirjam L. Beyrle, Che Liu, Wenjia Bai, Bernhard Kainz, Julia A. Schnabel, Benedikt Wiestler•May 20, 2025•172

框架內外:無界可控的圖像到視頻生成
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

Boyang Wang, Xuweiyi Chen, Matheus Gadelha, Zezhou Cheng•May 27, 2025•162

ImgEdit:統一圖像編輯數據集與基準測試平台
ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, Li Yuan•May 26, 2025•163

DetailFlow:基於下一細節預測的一維從粗到細自回歸圖像生成
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Yiheng Liu, Liao Qu, Huichao Zhang, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Xian Li, Shuai Wang, Daniel K. Du, Shu Cheng, Zehuan Yuan, Xinglong Wu•May 27, 2025•132

Active-O3:通過GRPO賦能多模態大型語言模型的主動感知能力
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, Mingyu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen•May 27, 2025•132

超越提示工程:通過目標原子實現大型語言模型的穩健行為控制
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms

Mengru Wang, Ziwen Xu, Shengyu Mao, Shumin Deng, Zhaopeng Tu, Huajun Chen, Ningyu Zhang•May 23, 2025•132

FinTagging:一個適用於大型語言模型的財務資訊提取與結構化基準
FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

Yan Wang, Yang Ren, Lingfei Qian, Xueqing Peng, Keyi Wang, Yi Han, Dongji Feng, Xiao-Yang Liu, Jimin Huang, Qianqian Xie•May 27, 2025•122

ViewSpatial-Bench:評估視覺語言模型中的多視角空間定位能力
ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

Dingming Li, Hongxing Li, Zixuan Wang, Yuchen Yan, Hang Zhang, Siqi Chen, Guiyang Hou, Shengpei Jiang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang•May 27, 2025•102

思考者:學會快思慢想
Thinker: Learning to Think Fast and Slow

Stephen Chung, Wenyu Du, Jie Fu•May 27, 2025•92

面向渲染的強化學習在向量圖形生成中的應用
Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Aarash Feizi, Rishav Pramanik, Pascal Wichmann, Arnab Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli•May 27, 2025•93

視覺工具代理(VisTA):一個基於強化學習的視覺工具選擇框架
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection

Zeyi Huang, Yuyang Ji, Anirudh Sundara Rajan, Zefan Cai, Wen Xiao, Junjie Hu, Yong Jae Lee•May 26, 2025•92

針對閉源多模態大語言模型的特徵最優對齊對抗攻擊
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, Yang Liu•May 27, 2025•82

SeePhys:視覺是否助益思考?——基於視覺的物理推理基準測試
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

Kun Xiang, Heng Li, Terry Jingchen Zhang, Yinya Huang, Zirong Liu, Peixin Qu, Jixi He, Jiaqi Chen, Yu-Jie Yuan, Jianhua Han, Hang Xu, Hanhui Li, Mrinmaya Sachan, Xiaodan Liang•May 25, 2025•83

MMMG:多任務多模態生成之全面且可靠的評估套件
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

Jihan Yao, Yushi Hu, Yujie Yi, Bin Han, Shangbin Feng, Guang Yang, Bingbing Wen, Ranjay Krishna, Lucy Lu Wang, Yulia Tsvetkov, Noah A. Smith, Banghua Zhu•May 23, 2025•82

MMPerspective:多模态大语言模型是否理解视角?一个全面的视角感知、推理与鲁棒性基准测试
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Yunlong Tang, Pinxin Liu, Mingqian Feng, Zhangyun Tan, Rui Mao, Chao Huang, Jing Bi, Yunzhong Xiao, Susan Liang, Hang Hua, Ali Vosoughi, Luchuan Song, Zeliang Zhang, Chenliang Xu•May 26, 2025•61

Alita:通用型智能體,以最少預定義與最大自我進化實現可擴展的自主推理
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, Mengdi Wang•May 26, 2025•64

VideoGameBench:視覺語言模型能否完成熱門電子遊戲?
VideoGameBench: Can Vision-Language Models complete popular video games?

Alex L. Zhang, Thomas L. Griffiths, Karthik R. Narasimhan, Ofir Press•May 23, 2025•63

透過多智能體協作擴展大型語言模型上下文窗口之外的外部知識輸入
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration

Zijun Liu, Zhennan Wan, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu•May 27, 2025•52

行前先走!透過強化學習實現簡潔的大型語言模型推理
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

Mingyang Song, Mao Zheng•May 27, 2025•52

雙重並行性的一分鐘短片
Minute-Long Videos with Dual Parallelisms

Zeqing Wang, Bowen Zheng, Xingyi Yang, Yuecong Xu, Xinchao Wang•May 27, 2025•52

超越马尔可夫性:基于贝叶斯自适应强化学习的大语言模型推理反思探索
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

Shenao Zhang, Yaqing Wang, Yinxiao Liu, Tianqi Liu, Peter Grabowski, Eugene Ie, Zhaoran Wang, Yunxuan Li•May 26, 2025•52

壓縮後的大型語言模型能否真正行動?對LLM壓縮中代理能力的實證評估
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Peijie Dong, Zhenheng Tang, Xiang Liu, Lujun Li, Xiaowen Chu, Bo Li•May 26, 2025•51

BiomedSQL:面向生物醫學知識庫科學推理的文本至SQL轉換
BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

Mathew J. Koretsky, Maya Willey, Adi Asija, Owen Bianchi, Chelsea X. Alvarado, Tanay Nayak, Nicole Kuznetsov, Sungwon Kim, Mike A. Nalls, Daniel Khashabi, Faraz Faghri•May 23, 2025•52

R1-Searcher++:透過強化學習激勵大型語言模型的動態知識獲取
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen•May 22, 2025•52

思考中的搜索与精炼:大型语言模型的自主检索增强推理
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

Yaorui Shi, Shihan Li, Chang Wu, Zhiyuan Liu, Junfeng Fang, Hengxing Cai, An Zhang, Xiang Wang•May 16, 2025•51

科幻:帧间插值的对称约束
Sci-Fi: Symmetric Constraint for Frame Inbetweening

Liuhan Chen, Xiaodong Cun, Xiaoyu Li, Xianyi He, Shenghai Yuan, Jie Chen, Ying Shan, Li Yuan•May 27, 2025•42

SoloSpeech:通過級聯生成管道提升目標語音提取的清晰度與質量
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

Helin Wang, Jiarui Hai, Dongchao Yang, Chen Chen, Kai Li, Junyi Peng, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Najim Dehak•May 25, 2025•42

多模态大語言模型深受模態偏差影響
MLLMs are Deeply Affected by Modality Bias

Xu Zheng, Chenfei Liao, Yuqian Fu, Kaiyu Lei, Yuanhuiyi Lyu, Lutao Jiang, Bin Ren, Jialei Chen, Jiawen Wang, Chengxin Li, Linfeng Zhang, Danda Pani Paudel, Xuanjing Huang, Yu-Gang Jiang, Nicu Sebe, Dacheng Tao, Luc Van Gool, Xuming Hu•May 24, 2025•42

空間知識圖譜引導的多模態合成
Spatial Knowledge Graph-Guided Multimodal Synthesis

Yida Xue, Zhen Bi, Jinnan Yang, Jungang Lou, Huajun Chen, Ningyu Zhang•May 28, 2025•31

逆向虛擬試穿:從著裝個體生成多類別產品風格圖像
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe•May 27, 2025•31

VLM-3R:基於指令對齊三維重建增強的視覺-語言模型
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Zhiwen Fan, Jian Zhang, Renjie Li, Junge Zhang, Runjin Chen, Hezhen Hu, Kevin Wang, Huaizhi Qu, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Tianlong Chen, Jiachen Li, Zhengzhong Tu, Zhangyang Wang, Rakesh Ranjan•May 26, 2025•32

基於能力的LLM紅隊測試規模化法則
Capability-Based Scaling Laws for LLM Red-Teaming

Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping•May 26, 2025•32

DFIR-Metric:一個用於評估大型語言模型在數位鑑識與事件回應中表現的基準數據集
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response

Bilel Cherif, Tamas Bisztray, Richard A. Dubniczky, Aaesha Aldahmani, Saeed Alshehhi, Norbert Tihanyi•May 26, 2025•32

模態策展:構建通用嵌入以實現先進的多模態信息檢索
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval

Fanheng Kong, Jingyuan Zhang, Yahui Liu, Hongzhi Zhang, Shi Feng, Xiaocui Yang, Daling Wang, Yu Tian, Victoria W., Fuzheng Zhang, Guorui Zhou•May 26, 2025•32

ComfyMind:基於樹狀規劃與反應式反饋的通用生成框架
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

Litao Guo, Xinli Xu, Luozhou Wang, Jiantao Lin, Jinsong Zhou, Zixin Zhang, Bolan Su, Ying-Cong Chen•May 23, 2025•33

AdInject:透過廣告投放對網路代理進行真實世界的黑箱攻擊
AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery

Haowei Wang, Junjie Wang, Xiaojun Jia, Rupeng Zhang, Mingyang Li, Zhe Liu, Yang Liu, Qing Wang•May 27, 2025•22

SATORI-R1:通过空间定位与可验证奖励机制激励多模态推理
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards

Chuming Shen, Wei Wei, Xiaoye Qu, Yu Cheng•May 25, 2025•22

PreMoe:通过专家剪枝与检索在受限内存上轻量化混合专家模型
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval

Zehua Pei, Ying Zhang, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu•May 23, 2025•22

R1-ShareVL:通过Share-GRPO激励多模态大语言模型的推理能力
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang•May 22, 2025•22

絕對座標使運動生成變得簡單
Absolute Coordinates Make Motion Generation Easy

Zichong Meng, Zeyu Han, Xiaogang Peng, Yiming Xie, Huaizu Jiang•May 26, 2025•12

CoreMatching:一種共適應稀疏推理框架,結合令牌與神經元剪枝以全面加速視覺-語言模型
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen•May 25, 2025•11

自動化事實查核中不確定性來源的解釋
Explaining Sources of Uncertainty in Automated Fact-Checking

Jingyi Sun, Greta Warren, Irina Shklovski, Isabelle Augenstein•May 23, 2025•11

熱帶注意力機制:組合算法中的神經算法推理
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms

Baran Hashemi, Kurt Pasque, Chris Teska, Ruriko Yoshida•May 22, 2025•11

通過SMILES解析提升大型語言模型的化學理解能力
Improving Chemical Understanding of LLMs via SMILES Parsing

Yunhui Jang, Jaehyung Kim, Sungsoo Ahn•May 22, 2025•12

RAG系統是否存在位置偏差?
Do RAG Systems Suffer From Positional Bias?

Florin Cuconasu, Simone Filice, Guy Horowitz, Yoelle Maarek, Fabrizio Silvestri•May 21, 2025•12

自蒸馏寄存器视觉变换器
Vision Transformers with Self-Distilled Registers

Yinjie Chen, Zipeng Yan, Chong Zhou, Bo Dai, Andrew F. Luo•May 27, 2025•02

Ankh3:結合序列去噪與補全的多任務預訓練提升蛋白質表徵能力
Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations

Hazem Alsamkary, Mohamed Elshaffei, Mohamed Elkerdawy, Ahmed Elnaggar•May 26, 2025•02

超越簡單串接:公平評估用於多鏈蛋白質相互作用預測的預訓練語言模型架構
Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction

Hazem Alsamkary, Mohamed Elshaffei, Mohamed Soudy, Sara Ossman, Abdallah Amr, Nehal Adel Abdelsalam, Mohamed Elkerdawy, Ahmed Elnaggar•May 26, 2025•02

基於強化學習優化的大型語言模型推理之可解釋性神經退行性失智症診斷框架
An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

Andrew Zamai, Nathanael Fijalkow, Boris Mansencal, Laurent Simon, Eloi Navet, Pierrick Coupe•May 26, 2025•02