ChatPaper.aiChatPaper

每日论文

RepText:通过复制实现视觉文本渲染
RepText: Rendering Visual Text via Replicating

Haofan Wang, Yujia Xu, Yimeng Li, Junchen Li, Chaowei Zhang, Jing Wang, Kejia Yang, Zhibo ChenApr 28, 2025222

大语言模型中的临床知识无法直接转化为人际互动能力
Clinical knowledge in LLMs does not translate to human interactions

Andrew M. Bean, Rebecca Payne, Guy Parsons, Hannah Rose Kirk, Juan Ciro, Rafael Mosquera, Sara Hincapié Monsalve, Aruna S. Ekanayaka, Lionel Tarassenko, Luc Rocher, Adam MahdiApr 26, 2025182

LLM驱动的GUI代理在手机自动化中的应用:进展与前景综述
LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects

Guangyi Liu, Pengxiang Zhao, Liang Liu, Yaxuan Guo, Han Xiao, Weifeng Lin, Yuxiang Chai, Yue Han, Shuai Ren, Hao Wang, Xiaoyu Liang, Wenhao Wang, Tianze Wu, Linghao Li, Hao Wang, Guanjing Xiong, Yong Liu, Hongsheng LiApr 28, 2025173

CipherBank:通过密码学挑战探索大语言模型推理能力的边界
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun WuApr 27, 2025123

SPC:通过对抗性游戏进化自博弈评判器以提升大语言模型推理能力
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. WongApr 27, 2025111

MMInference:通过模态感知排列稀疏注意力加速长上下文视觉语言模型的预填充
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention

Yucheng Li, Huiqiang Jiang, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili QiuApr 22, 202581

多模态数学推理基准测试:显式视觉依赖关系
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency

Zhikai Wang, Jiashuo Sun, Wenqi Zhang, Zhiqiang Hu, Xin Li, Fan Wang, Deli ZhaoApr 24, 202551

TrustGeoGen:面向可信多模态几何问题求解的可扩展形式化验证数据引擎
TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

Daocheng Fu, Zijun Chen, Renqiu Xia, Qi Liu, Yuan Feng, Hongbin Zhou, Renrui Zhang, Shiyang Feng, Peng Gao, Junchi Yan, Botian Shi, Bo Zhang, Yu QiaoApr 22, 202541

Mem0:构建具备可扩展长期记忆的生产级AI代理
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, Deshraj YadavApr 28, 202511

基于提示控制的通用歌曲生成框架
Versatile Framework for Song Generation with Prompt-based Control

Yu Zhang, Wenxiang Guo, Changhao Pan, Zhiyuan Zhu, Ruiqi Li, Jingyu Lu, Rongjie Huang, Ruiyuan Zhang, Zhiqing Hong, Ziyue Jiang, Zhou ZhaoApr 27, 202511

NORA:一款面向具身任务的小型开源通用视觉语言动作模型
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

Chia-Yu Hung, Qi Sun, Pengfei Hong, Amir Zadeh, Chuan Li, U-Xuan Tan, Navonil Majumder, Soujanya PoriaApr 28, 202501