ChatPaper.ai
打开菜单
首页
每日论文
arXiv
HuggingFace
定价
账户
工作台
🇨🇳
中文简体
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究论文每日精选
每日精选AI研究论文及翻译
March 17th, 2025
API代理与GUI代理:分化与融合
API Agents vs. GUI Agents: Divergence and Convergence
Chaoyun Zhang, Shilin He, Liqun Li, Si Qin, Yu Kang, Qingwei Lin, Dongmei Zhang
•
Mar 14, 2025
•
37
2
PLADIS:利用稀疏性在推理时突破扩散模型注意力机制的极限
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim, Byeongsu Sim
•
Mar 10, 2025
•
84
2
MaRI:跨领域材料检索集成
MaRI: Material Retrieval Integration across Domains
Jianhui Wang, Zhifei Yang, Yangfan He, Huixiong Zhang, Yuxuan Chen, Jingwei Huang
•
Mar 11, 2025
•
7
2
群体鲁棒的机器遗忘
Group-robust Machine Unlearning
Thomas De Min, Subhankar Roy, Stéphane Lathuilière, Elisa Ricci, Massimiliano Mancini
•
Mar 12, 2025
•
1
2
CHOrD:生成无碰撞、房屋规模且有序的3D室内场景数字孪生,具备可控平面图与最优布局
CHOrD: Generation of Collision-Free, House-Scale, and Organized Digital Twins for 3D Indoor Scenes with Controllable Floor Plans and Optimal Layouts
Chong Su, Yingbin Fu, Zheyuan Hu, Jing Yang, Param Hanji, Shaojun Wang, Xuan Zhao, Cengiz Öztireli, Fangcheng Zhong
•
Mar 15, 2025
•
3
3
效能与效率技术:状态空间模型综述
Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models
Xingtai Lv, Youbang Sun, Kaiyan Zhang, Shang Qu, Xuekai Zhu, Yuchen Fan, Yi Wu, Ermo Hua, Xinwei Long, Ning Ding, Bowen Zhou
•
Mar 14, 2025
•
27
2
邻域自回归建模用于高效视觉生成
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He, Yuanyu He, Shaoxuan He, Feng Chen, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
•
Mar 12, 2025
•
8
3
大型推理模型能否在感知不确定性下进行类比推理?
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty?
Giacomo Camposampiero, Michael Hersche, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi
•
Mar 14, 2025
•
5
2
ProJudge:面向多模态大语言模型流程判定的多领域基准与指令微调数据集
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai, Pengfei Zhou, Zhaopan Xu, Ming Li, Fanrui Zhang, Zizhen Li, Jianwen Sun, Yukang Feng, Baojin Huang, Zhongyuan Wang, Kaipeng Zhang
•
Mar 9, 2025
•
8
2
SmolDocling:一款超紧凑的视觉-语言模型,用于端到端的多模态文档转换
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Ahmed Nassar, Andres Marafioti, Matteo Omenetti, Maksym Lysak, Nikolaos Livathinos, Christoph Auer, Lucas Morin, Rafael Teixeira de Lima, Yusik Kim, A. Said Gurbuz, Michele Dolfi, Miquel Farré, Peter W. J. Staar
•
Mar 14, 2025
•
100
14
Cockatiel:融合合成与人类偏好训练的精细视频描述生成
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Luozheng Qin, Zhiyu Tan, Mengping Yang, Xiaomeng Yang, Hao Li
•
Mar 12, 2025
•
5
2
TreeMeshGPT:基于自回归树序列的艺术化网格生成
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Stefan Lionar, Jiabin Liang, Gim Hee Lee
•
Mar 14, 2025
•
6
2
ARMOR v0.1:通过非对称协同赋能自回归多模态理解模型,实现交错多模态生成
ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy
Jianwen Sun, Yukang Feng, Chuanhao Li, Fanrui Zhang, Zizhen Li, Jiaxin Ai, Sizhuo Zhou, Yu Dai, Shenglin Zhang, Kaipeng Zhang
•
Mar 9, 2025
•
8
2
ETCH:通过等变紧致性将身体拟合推广至着装人体
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
Boqian Li, Haiwen Feng, Zeyu Cai, Michael J. Black, Yuliang Xiu
•
Mar 13, 2025
•
8
2
Kolmogorov-Arnold注意力机制:可学习的注意力是否更适合视觉Transformer?
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers?
Subhajit Maity, Killian Hitsman, Xin Li, Aritra Dutta
•
Mar 13, 2025
•
14
2
TxAgent:一款跨工具宇宙进行诊疗推理的AI智能体
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools
Shanghua Gao, Richard Zhu, Zhenglun Kong, Ayush Noori, Xiaorui Su, Curtis Ginder, Theodoros Tsiligkaridis, Marinka Zitnik
•
Mar 14, 2025
•
17
3
通过轨迹分布匹配学习少步扩散模型
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Yihong Luo, Tianyang Hu, Jiacheng Sun, Yujun Cai, Jing Tang
•
Mar 9, 2025
•
7
3
对抗性数据采集:人机协作扰动助力高效稳健的机器人模仿学习
Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Siyuan Huang, Yue Liao, Siyuan Feng, Shu Jiang, Si Liu, Hongsheng Li, Maoqing Yao, Guanghui Ren
•
Mar 14, 2025
•
36
2
从TOWER到SPIRE:为纯文本大语言模型增添语音模态
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke, Ben Peters, Sonal Sannigrahi, Anil Keshwani, Tsz Kin Lam, Bruno Martins, Marcely Zanon Boito, André F. T. Martins
•
Mar 13, 2025
•
7
2
从非分段演示中实现开放世界技能发现
Open-World Skill Discovery from Unsegmented Demonstrations
Jingwen Deng, Zihao Wang, Shaofei Cai, Anji Liu, Yitao Liang
•
Mar 11, 2025
•
5
3
FlowTok:在文本与图像标记间无缝流转
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He, Qihang Yu, Qihao Liu, Liang-Chieh Chen
•
Mar 13, 2025
•
19
2
Vamba:利用混合Mamba-Transformer架构理解时长一小时的视频
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen
•
Mar 14, 2025
•
20
2
GoalFlow:面向端到端自动驾驶的多模态轨迹生成之目标驱动流匹配
GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving
Zebin Xing, Xingyu Zhang, Yang Hu, Bo Jiang, Tong He, Qian Zhang, Xiaoxiao Long, Wei Yin
•
Mar 7, 2025
•
3
2
ReCamMaster:基于单视频的相机控制生成式渲染
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang, Lianrui Mu, Jinwen Cao, Zuozhu Liu, Haoji Hu, Xiang Bai, Pengfei Wan, Di Zhang
•
Mar 14, 2025
•
140
5
探索联邦学习的脆弱性:深入剖析梯度反演攻击
Exploring the Vulnerabilities of Federated Learning: A Deep Dive into Gradient Inversion Attacks
Pengxin Guo, Runxi Wang, Shuang Zeng, Jinjing Zhu, Haoning Jiang, Yanran Wang, Yuyin Zhou, Feifei Wang, Hui Xiong, Liangqiong Qu
•
Mar 13, 2025
•
16
2
大规模预训练用于基于视觉的视频描述生成
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos, Cordelia Schmid, Josef Sivic
•
Mar 13, 2025
•
17
2
VGGT:基于视觉几何的Transformer模型
VGGT: Visual Geometry Grounded Transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, David Novotny
•
Mar 14, 2025
•
21
2