ChatPaper.aiChatPaper.ai
首页

arXiv

HuggingFace

定价账户工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究论文每日精选

每日精选AI研究论文及翻译

科学委员会:评估多模态自主代理在真实科学工作流程中的表现
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu•May 26, 2025•902

Paper2Poster:迈向从科研论文到多模态海报的自动化生成
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr•May 27, 2025•791

MME-Reasoning:多模态大语言模型逻辑推理综合基准测试
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs

Jiakang Yuan, Tianshuo Peng, Yilei Jiang, Yiting Lu, Renrui Zhang, Kaituo Feng, Chaoyou Fu, Tao Chen, Lei Bai, Bo Zhang, Xiangyu Yue•May 27, 2025•763

OmniConsistency:从成对风格化数据中学习风格无关的一致性
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data

Yiren Song, Cheng Liu, Mike Zheng Shou•May 24, 2025•592

OpenS2V-Nexus:面向主题到视频生成的详尽基准与百万级数据集
OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Shenghai Yuan, Xianyi He, Yufan Deng, Yang Ye, Jinfa Huang, Bin Lin, Chongyang Ma, Jiebo Luo, Li Yuan•May 26, 2025•493

SynLogic:大规模合成可验证推理数据,助力逻辑推理及更广泛领域的学习
SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Junteng Liu, Yuanxiang Fan, Zhuo Jiang, Han Ding, Yongyi Hu, Chi Zhang, Yiqi Shi, Shitong Weng, Aili Chen, Shiqi Chen, Yunan Huang, Mozhi Zhang, Pengyu Zhao, Junjie Yan, Junxian He•May 26, 2025•452

勿需过度思考:优选简短思维链以提升大语言模型推理能力
Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning

Michael Hassid, Gabriel Synnaeve, Yossi Adi, Roy Schwartz•May 23, 2025•454

探索大语言模型在单步文本生成中的潜在能力
Exploring the Latent Capacity of LLMs for One-Step Text Generation

Gleb Mezentsev, Ivan Oseledets•May 27, 2025•441

直觉引导:基于强化内在置信度的高效测试时扩展
Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence

Amirhosein Ghasemabadi, Keith G. Mills, Baochun Li, Di Niu•May 23, 2025•402

MMMR:大规模多模态推理任务基准测试
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

Guiyao Tie, Xueyang Zhou, Tianhe Gu, Ruihang Zhang, Chaoran Hu, Sizhe Zhang, Mengqu Sun, Yan Zhang, Pan Zhou, Lichao Sun•May 22, 2025•384

VerIPO:通过验证器引导的迭代策略优化培养视频大语言模型的长程推理能力
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization

Yunxin Li, Xinyu Chen, Zitao Li, Zhenyu Liu, Longyue Wang, Wenhan Luo, Baotian Hu, Min Zhang•May 25, 2025•364

Sparse VideoGen2:通过语义感知重排序的稀疏注意力加速视频生成
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao, Muyang Li, Jintao Zhang, Han Cai, Yujun Lin, Xiuyu Li, Chenfeng Xu, Kelly Peng, Jianfei Chen, Song Han, Kurt Keutzer, Ion Stoica•May 24, 2025•362

UI-Genie:一种基于多模态大语言模型的移动GUI代理迭代增强自优化方法
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

Han Xiao, Guozhi Wang, Yuxiang Chai, Zimu Lu, Weifeng Lin, Hao He, Lue Fan, Liuyang Bian, Rui Hu, Liang Liu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Aojun Zhou, Hongsheng Li•May 27, 2025•351

MME-VideoOCR:评估多模态大语言模型在视频场景中的OCR能力
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios

Yang Shi, Huanqian Wang, Wulin Xie, Huanyao Zhang, Lijie Zhao, Yi-Fan Zhang, Xinfeng Li, Chaoyou Fu, Zhuoer Wen, Wenting Liu, Zhuoran Zhang, Xinlong Chen, Bohan Zeng, Sihan Yang, Yuanxing Zhang, Pengfei Wan, Haotian Wang, Wenjing Yang•May 27, 2025•301

GraLoRA:面向参数高效微调的细粒度低秩适配
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park•May 26, 2025•302

Video-Holmes:多模态大语言模型能否像福尔摩斯一样进行复杂视频推理?
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?

Junhao Cheng, Yuying Ge, Teng Wang, Yixiao Ge, Jing Liao, Ying Shan•May 27, 2025•272

rStar-Coder:基于大规模验证数据集扩展竞争性代码推理能力
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

Yifei Liu, Li Lyna Zhang, Yi Zhu, Bingcheng Dong, Xudong Zhou, Ning Shang, Fan Yang, Mao Yang•May 27, 2025•233

MetaMind:运用元认知多智能体系统模拟人类社交思维
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems

Xuanming Zhang, Yuxuan Chen, Min-Hsuan Yeh, Yixuan Li•May 25, 2025•234

无需验证器强化通用推理能力
Reinforcing General Reasoning without Verifiers

Xiangxin Zhou, Zichen Liu, Anya Sims, Haonan Wang, Tianyu Pang, Chongxuan Li, Liang Wang, Min Lin, Chao Du•May 27, 2025•182

超越蒸馏:以极简规则强化学习突破医疗大语言模型推理的极限
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL

Che Liu, Haozhe Wang, Jiazhen Pan, Zhongwei Wan, Yong Dai, Fangzhen Lin, Wenjia Bai, Daniel Rueckert, Rossella Arcucci•May 23, 2025•182

代码图模型(CGM):一种图集成的大型语言模型,用于仓库级软件工程任务
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

Hongyuan Tao, Ying Zhang, Zhenhao Tang, Hongen Peng, Xukun Zhu, Bingchang Liu, Yingguang Yang, Ziyin Zhang, Zhaogui Xu, Haipeng Zhang, Linchao Zhu, Rui Wang, Hang Yu, Jianguo Li, Peng Di•May 22, 2025•182

HoliTom:面向快速视频大语言模型的全方位令牌融合技术
HoliTom: Holistic Token Merging for Fast Video Large Language Models

Kele Shao, Keda Tao, Can Qin, Haoxuan You, Yang Sui, Huan Wang•May 27, 2025•172

MotionPro:面向图像到视频生成的精准运动控制器
MotionPro: A Precise Motion Controller for Image-to-Video Generation

Zhongwei Zhang, Fuchen Long, Zhaofan Qiu, Yingwei Pan, Wu Liu, Ting Yao, Tao Mei•May 26, 2025•163

NOVA:脑部MRI异常定位与临床推理基准测试
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI

Cosmin I. Bercea, Jun Li, Philipp Raffler, Evamaria O. Riedel, Lena Schmitzer, Angela Kurz, Felix Bitzer, Paula Roßmüller, Julian Canisius, Mirjam L. Beyrle, Che Liu, Wenjia Bai, Bernhard Kainz, Julia A. Schnabel, Benedikt Wiestler•May 20, 2025•162

对齐如何提升大语言模型的多语言能力?从语言神经元视角解析
How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective

Shimao Zhang, Zhejian Lai, Xiang Liu, Shuaijie She, Xiao Liu, Yeyun Gong, Shujian Huang, Jiajun Chen•May 27, 2025•152

ImgEdit:统一的图像编辑数据集与基准平台
ImgEdit: A Unified Image Editing Dataset and Benchmark

Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, Li Yuan•May 26, 2025•153

超越提示工程:通过目标原子实现LLM的稳健行为控制
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms

Mengru Wang, Ziwen Xu, Shengyu Mao, Shumin Deng, Zhaopeng Tu, Huajun Chen, Ningyu Zhang•May 23, 2025•132

SweEval:大语言模型真的会说脏话吗?面向企业应用的安全基准测试
SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use

Hitesh Laxmichand Patel, Amit Agarwal, Arion Das, Bhargava Kumar, Srikant Panda, Priyaranjan Pattnayak, Taki Hasan Rafi, Tejaswini Kumar, Dong-Kyu Chae•May 22, 2025•133

帧内帧外:无界可控的图像到视频生成
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation

Boyang Wang, Xuweiyi Chen, Matheus Gadelha, Zezhou Cheng•May 27, 2025•122

DetailFlow:基于下一细节预测的一维由粗到细自回归图像生成
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Yiheng Liu, Liao Qu, Huichao Zhang, Xu Wang, Yi Jiang, Yiming Gao, Hu Ye, Xian Li, Shuai Wang, Daniel K. Du, Shu Cheng, Zehuan Yuan, Xinglong Wu•May 27, 2025•121

Active-O3:通过GRPO赋能多模态大语言模型的主动感知能力
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, Mingyu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen•May 27, 2025•112

ViewSpatial-Bench:评估视觉-语言模型中的多视角空间定位能力
ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models

Dingming Li, Hongxing Li, Zixuan Wang, Yuchen Yan, Hang Zhang, Siqi Chen, Guiyang Hou, Shengpei Jiang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang•May 27, 2025•102

FinTagging:面向大语言模型的金融信息提取与结构化基准测试
FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

Yan Wang, Yang Ren, Lingfei Qian, Xueqing Peng, Keyi Wang, Yi Han, Dongji Feng, Xiao-Yang Liu, Jimin Huang, Qianqian Xie•May 27, 2025•92

思考者:学习快速与慢速思维
Thinker: Learning to Think Fast and Slow

Stephen Chung, Wenyu Du, Jie Fu•May 27, 2025•82

面向渲染的强化学习在矢量图形生成中的应用
Rendering-Aware Reinforcement Learning for Vector Graphics Generation

Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Aarash Feizi, Rishav Pramanik, Pascal Wichmann, Arnab Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli•May 27, 2025•82

SeePhys:视觉是否助力思维?——基于视觉的物理推理基准测试
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning

Kun Xiang, Heng Li, Terry Jingchen Zhang, Yinya Huang, Zirong Liu, Peixin Qu, Jixi He, Jiaqi Chen, Yu-Jie Yuan, Jianhua Han, Hang Xu, Hanhui Li, Mrinmaya Sachan, Xiaodan Liang•May 25, 2025•83

MMMG:多任务多模态生成的全面可靠评估套件
MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation

Jihan Yao, Yushi Hu, Yujie Yi, Bin Han, Shangbin Feng, Guang Yang, Bingbing Wen, Ranjay Krishna, Lucy Lu Wang, Yulia Tsvetkov, Noah A. Smith, Banghua Zhu•May 23, 2025•82

针对闭源多模态大语言模型的特征最优对齐对抗攻击
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment

Xiaojun Jia, Sensen Gao, Simeng Qin, Tianyu Pang, Chao Du, Yihao Huang, Xinfeng Li, Yiming Li, Bo Li, Yang Liu•May 27, 2025•62

MMPerspective:多模态大语言模型能否理解视角?一个全面的视角感知、推理与鲁棒性基准
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Yunlong Tang, Pinxin Liu, Mingqian Feng, Zhangyun Tan, Rui Mao, Chao Huang, Jing Bi, Yunzhong Xiao, Susan Liang, Hang Hua, Ali Vosoughi, Luchuan Song, Zeliang Zhang, Chenliang Xu•May 26, 2025•61

视觉工具代理(VisTA):一种用于视觉工具选择的强化学习框架
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection

Zeyi Huang, Yuyang Ji, Anirudh Sundara Rajan, Zefan Cai, Wen Xiao, Junjie Hu, Yong Jae Lee•May 26, 2025•62

VideoGameBench:视觉-语言模型能否通关热门视频游戏?
VideoGameBench: Can Vision-Language Models complete popular video games?

Alex L. Zhang, Thomas L. Griffiths, Karthik R. Narasimhan, Ofir Press•May 23, 2025•62

通过多智能体协作扩展大语言模型上下文窗口之外的外部知识输入
Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration

Zijun Liu, Zhennan Wan, Peng Li, Ming Yan, Ji Zhang, Fei Huang, Yang Liu•May 27, 2025•52

行稳致远!通过强化学习实现简洁的大型语言模型推理
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning

Mingyang Song, Mao Zheng•May 27, 2025•52

超越马尔可夫性:通过贝叶斯自适应强化学习实现大语言模型的反思性探索
Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

Shenao Zhang, Yaqing Wang, Yinxiao Liu, Tianqi Liu, Peter Grabowski, Eugene Ie, Zhaoran Wang, Yunxuan Li•May 26, 2025•52

Alita:通用型智能体——以最小预定义与最大自我进化实现可扩展的代理推理
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, Mengdi Wang•May 26, 2025•52

压缩后的大语言模型能否真正行动?大语言模型压缩中代理能力的实证评估
Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression

Peijie Dong, Zhenheng Tang, Xiang Liu, Lujun Li, Xiaowen Chu, Bo Li•May 26, 2025•51

BiomedSQL:面向生物医学知识库科学推理的文本到SQL转换系统
BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases

Mathew J. Koretsky, Maya Willey, Adi Asija, Owen Bianchi, Chelsea X. Alvarado, Tanay Nayak, Nicole Kuznetsov, Sungwon Kim, Mike A. Nalls, Daniel Khashabi, Faraz Faghri•May 23, 2025•52

R1-Searcher++:通过强化学习激励大语言模型的动态知识获取
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen•May 22, 2025•52

思考中的搜索与精炼:大语言模型的自主检索增强推理
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs

Yaorui Shi, Shihan Li, Chang Wu, Zhiyuan Liu, Junfeng Fang, Hengxing Cai, An Zhang, Xiang Wang•May 16, 2025•51

科幻帧间插值:对称约束下的帧生成
Sci-Fi: Symmetric Constraint for Frame Inbetweening

Liuhan Chen, Xiaodong Cun, Xiaoyu Li, Xianyi He, Shenghai Yuan, Jie Chen, Ying Shan, Li Yuan•May 27, 2025•42

双并行机制下的分钟级视频处理
Minute-Long Videos with Dual Parallelisms

Zeqing Wang, Bowen Zheng, Xingyi Yang, Yuecong Xu, Xinchao Wang•May 27, 2025•42

SoloSpeech:通过级联生成式管道提升目标语音提取的清晰度与质量
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

Helin Wang, Jiarui Hai, Dongchao Yang, Chen Chen, Kai Li, Junyi Peng, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Najim Dehak•May 25, 2025•42

多模态大语言模型(MLLMs)深受模态偏差的影响。
MLLMs are Deeply Affected by Modality Bias

Xu Zheng, Chenfei Liao, Yuqian Fu, Kaiyu Lei, Yuanhuiyi Lyu, Lutao Jiang, Bin Ren, Jialei Chen, Jiawen Wang, Chengxin Li, Linfeng Zhang, Danda Pani Paudel, Xuanjing Huang, Yu-Gang Jiang, Nicu Sebe, Dacheng Tao, Luc Van Gool, Xuming Hu•May 24, 2025•42

DFIR-Metric:一个用于评估大语言模型在数字取证与事件响应中表现的基准数据集
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response

Bilel Cherif, Tamas Bisztray, Richard A. Dubniczky, Aaesha Aldahmani, Saeed Alshehhi, Norbert Tihanyi•May 26, 2025•32

模态整合:构建通用嵌入以实现高级多模态信息检索
Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval

Fanheng Kong, Jingyuan Zhang, Yahui Liu, Hongzhi Zhang, Shi Feng, Xiaocui Yang, Daling Wang, Yu Tian, Victoria W., Fuzheng Zhang, Guorui Zhou•May 26, 2025•32

空间知识图谱引导的多模态合成
Spatial Knowledge Graph-Guided Multimodal Synthesis

Yida Xue, Zhen Bi, Jinnan Yang, Jungang Lou, Huajun Chen, Ningyu Zhang•May 28, 2025•21

AdInject:通过广告投放对网络代理实施真实世界黑盒攻击
AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery

Haowei Wang, Junjie Wang, Xiaojun Jia, Rupeng Zhang, Mingyang Li, Zhe Liu, Yang Liu, Qing Wang•May 27, 2025•22

VLM-3R:融合指令对齐三维重建的视觉-语言模型
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

Zhiwen Fan, Jian Zhang, Renjie Li, Junge Zhang, Runjin Chen, Hezhen Hu, Kevin Wang, Huaizhi Qu, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Tianlong Chen, Jiachen Li, Zhengzhong Tu, Zhangyang Wang, Rakesh Ranjan•May 26, 2025•22

基于能力的大语言模型红队测试扩展法则
Capability-Based Scaling Laws for LLM Red-Teaming

Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping•May 26, 2025•22

SATORI-R1:通过空间定位与可验证奖励机制激励多模态推理
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards

Chuming Shen, Wei Wei, Xiaoye Qu, Yu Cheng•May 25, 2025•21

ComfyMind:基于树形规划与反应式反馈的通用生成框架
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

Litao Guo, Xinli Xu, Luozhou Wang, Jiantao Lin, Jinsong Zhou, Zixin Zhang, Bolan Su, Ying-Cong Chen•May 23, 2025•23

PreMoe:通过专家剪枝与检索实现受限内存下的轻量化混合专家模型
PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval

Zehua Pei, Ying Zhang, Hui-Ling Zhen, Xianzhi Yu, Wulong Liu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu•May 23, 2025•22

R1-ShareVL:通过共享梯度奖励策略优化激励多模态大语言模型的推理能力
R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO

Huanjin Yao, Qixiang Yin, Jingyi Zhang, Min Yang, Yibo Wang, Wenhao Wu, Fei Su, Li Shen, Minghui Qiu, Dacheng Tao, Jiaxing Huang•May 22, 2025•22

逆向虚拟试衣:从着装个体生成多品类产品风格图像
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals

Davide Lobba, Fulvio Sanguigni, Bin Ren, Marcella Cornia, Rita Cucchiara, Nicu Sebe•May 27, 2025•11

绝对坐标简化运动生成
Absolute Coordinates Make Motion Generation Easy

Zichong Meng, Zeyu Han, Xiaogang Peng, Yiming Xie, Huaizu Jiang•May 26, 2025•12

CoreMatching:一种协同自适应稀疏推理框架,通过令牌与神经元剪枝实现视觉-语言模型的全面加速
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen•May 25, 2025•11

自动化事实核查中的不确定性来源解析
Explaining Sources of Uncertainty in Automated Fact-Checking

Jingyi Sun, Greta Warren, Irina Shklovski, Isabelle Augenstein•May 23, 2025•11

热带注意力机制:面向组合算法的神经算法推理
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms

Baran Hashemi, Kurt Pasque, Chris Teska, Ruriko Yoshida•May 22, 2025•11

通过SMILES解析提升大语言模型的化学理解能力
Improving Chemical Understanding of LLMs via SMILES Parsing

Yunhui Jang, Jaehyung Kim, Sungsoo Ahn•May 22, 2025•12

RAG系统是否存在位置偏差?
Do RAG Systems Suffer From Positional Bias?

Florin Cuconasu, Simone Filice, Guy Horowitz, Yoelle Maarek, Fabrizio Silvestri•May 21, 2025•12

自蒸馏寄存器视觉Transformer
Vision Transformers with Self-Distilled Registers

Yinjie Chen, Zipeng Yan, Chong Zhou, Bo Dai, Andrew F. Luo•May 27, 2025•02

Ankh3:通过序列去噪与补全的多任务预训练提升蛋白质表征能力
Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations

Hazem Alsamkary, Mohamed Elshaffei, Mohamed Elkerdawy, Ahmed Elnaggar•May 26, 2025•02

超越简单拼接:公平评估用于多链蛋白质-蛋白质相互作用预测的预训练语言模型架构
Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction

Hazem Alsamkary, Mohamed Elshaffei, Mohamed Soudy, Sara Ossman, Abdallah Amr, Nehal Adel Abdelsalam, Mohamed Elkerdawy, Ahmed Elnaggar•May 26, 2025•02

基于强化优化大语言模型推理的神经退行性痴呆可解释诊断框架
An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning

Andrew Zamai, Nathanael Fijalkow, Boris Mansencal, Laurent Simon, Eloi Navet, Pierrick Coupe•May 26, 2025•02