ChatPaper.aiChatPaper.ai
首頁

arXiv

HuggingFace

定價賬戶工作台

•
•

•
•

•
•

•
•

•
•

Footer

Company name

ChatPaper.ai: Your advanced AI reading assistant.

Contact us: [email protected]

X (Twitter)

Products

  • AI Search
  • AI Mind Map
  • Arxiv Summary
  • Huggingface Summary

Support

  • FAQ
  • Contact

Company

  • Blog
  • Privacy Policy
  • Terms of Service

Available Languages

  • 🇬🇧English
  • 🇨🇳中文简体
  • 🇭🇰繁體中文
  • 🇯🇵日本語
  • 🇰🇷한국어
  • 🇩🇪Deutsch
  • 🇫🇷Français
  • 🇷🇺Русский
  • 🇪🇸Español

© 2025 chatpaper.ai All rights reserved.

AI研究論文每日精選

每日精選AI研究論文及翻譯

SeedVR2:基於擴散對抗性後訓練的一步式視頻修復
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Jianyi Wang, Shanchuan Lin, Zhijie Lin, Yuxi Ren, Meng Wei, Zongsheng Yue, Shangchen Zhou, Hao Chen, Yang Zhao, Ceyuan Yang, Xuefeng Xiao, Chen Change Loy, Lu Jiang•Jun 5, 2025•441

ComfyUI-Copilot:自動化工作流開發的智能助手
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development

Zhenran Xu, Xue Yang, Yiyu Wang, Qingli Hu, Zijiao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang, Baotian Hu, Min Zhang•Jun 5, 2025•431

视频世界模型与长期空间记忆
Video World Models with Long-term Spatial Memory

Tong Wu, Shuai Yang, Ryan Po, Yinghao Xu, Ziwei Liu, Dahua Lin, Gordon Wetzstein•Jun 5, 2025•361

RoboRefer:面向机器人视觉语言模型的空间指代推理研究
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Enshen Zhou, Jingkun An, Cheng Chi, Yi Han, Shanyu Rong, Chi Zhang, Pengwei Wang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, Shanghang Zhang•Jun 4, 2025•363

對角線批次處理解鎖了循環記憶變壓器在長上下文中的並行性
Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts

Danil Sivtsov, Ivan Rodkin, Gleb Kuzmin, Yuri Kuratov, Ivan Oseledets•Jun 5, 2025•333

Qwen3嵌入:通过基础模型推进文本嵌入与重排序技术
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, Jingren Zhou•Jun 5, 2025•321

Surfer-H 與 Holo1 相遇:由開源權重驅動的高效成本網絡代理
Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights

Mathieu Andreux, Breno Baldas Skuk, Hamza Benchekroun, Emilien Biré, Antoine Bonnet, Riaz Bordie, Matthias Brunel, Pierre-Louis Cedoz, Antoine Chassang, Mickaël Chen, Alexandra D. Constantinou, Antoine d'Andigné, Hubert de La Jonquière, Aurélien Delfosse, Ludovic Denoyer, Alexis Deprez, Augustin Derupti, Michael Eickenberg, Mathïs Federico, Charles Kantor, Xavier Koegler, Yann Labbé, Matthew C. H. Lee, Erwan Le Jumeau de Kergaradec, Amir Mahla, Avshalom Manevich, Adrien Maret, Charles Masson, Rafaël Maurin, Arturo Mena, Philippe Modard, Axel Moyal, Axel Nguyen Kerbel, Julien Revelle, Mats L. Richter, María Santos, Laurent Sifre, Maxime Theillard, Marc Thibault, Louis Thiry, Léo Tronchon, Nicolas Usunier, Tony Wu•Jun 3, 2025•272

《公共文本集v0.1:一个包含8TB公共领域及开放许可文本的数据集》
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Nikhil Kandpal, Brian Lester, Colin Raffel, Sebastian Majstorovic, Stella Biderman, Baber Abbasi, Luca Soldaini, Enrico Shippole, A. Feder Cooper, Aviya Skowron, John Kirchenbauer, Shayne Longpre, Lintang Sutawika, Alon Albalak, Zhenlin Xu, Guilherme Penedo, Loubna Ben Allal, Elie Bakouch, John David Pressman, Honglu Fan, Dashiell Stander, Guangyu Song, Aaron Gokaslan, Tom Goldstein, Brian R. Bartoldson, Bhavya Kailkhura, Tyler Murray•Jun 5, 2025•261

VideoREPA:通過與基礎模型的關係對齊學習視頻生成中的物理規律
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models

Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang, Fanqing Meng, Xiangpeng Wan, Junchi Yan, Yu Cheng•May 29, 2025•242

對齊潛在空間與流動先驗
Aligning Latent Spaces with Flow Priors

Yizhuo Li, Yuying Ge, Yixiao Ge, Ying Shan, Ping Luo•Jun 5, 2025•231

VideoMathQA:基於視頻多模態理解的數學推理基準測試
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

Hanoona Rasheed, Abdelrahman Shaker, Anqi Tang, Muhammad Maaz, Ming-Hsuan Yang, Salman Khan, Fahad Khan•Jun 5, 2025•221

AV-Reasoner:提升與基準測試基於線索的多模態大語言模型音視覺計數能力
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs

Lidong Lu, Guo Chen, Zhiqi Li, Yicheng Liu, Tong Lu•Jun 5, 2025•201

推理時期的超縮放與KV快取壓縮
Inference-Time Hyper-Scaling with KV Cache Compression

Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot, Edoardo M. Ponti•Jun 5, 2025•191

展開空間認知:基於視覺模擬的多模態模型評估
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations

Linjie Li, Mahtab Bigverdi, Jiawei Gu, Zixian Ma, Yinuo Yang, Ziang Li, Yejin Choi, Ranjay Krishna•Jun 5, 2025•161

稀疏MM:多模態大語言模型中視覺概念響應引發的頭部稀疏性
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

Jiahui Wang, Zuyan Liu, Yongming Rao, Jiwen Lu•Jun 5, 2025•150

StreamBP:面向长序列大语言模型训练的内存高效精确反向传播
StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs

Qijun Luo, Mengqi Li, Lei Zhao, Xiao Li•Jun 3, 2025•152

EOC-Bench:多模态大語言模型能否識別、回憶並預測自我中心視角下的物體?
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?

Yuqian Yuan, Ronghao Dang, Long Li, Wentong Li, Dian Jiao, Xin Li, Deli Zhao, Fan Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang•Jun 5, 2025•131

搜尋競技場:分析搜尋增強型大型語言模型
Search Arena: Analyzing Search-Augmented LLMs

Mihran Miroyan, Tsung-Han Wu, Logan King, Tianle Li, Jiayi Pan, Xinyan Hu, Wei-Lin Chiang, Anastasios N. Angelopoulos, Trevor Darrell, Narges Norouzi, Joseph E. Gonzalez•Jun 5, 2025•121

MINT-CoT:在數學思維鏈推理中實現視覺符號的交錯嵌入
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning

Xinyan Chen, Renrui Zhang, Dongzhi Jiang, Aojun Zhou, Shilin Yan, Weifeng Lin, Hongsheng Li•Jun 5, 2025•121

重探深度表徵於前饋式三維高斯潑濺之應用
Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting

Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jia-Wang Bian, Bohan Zhuang, Chunhua Shen•Jun 5, 2025•111

評估即為關鍵:透過評估設計策略性誇大LLM推理能力
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design

Lin Sun, Weihong Lin, Jinzhu Wu, Yongfu Zhu, Xiaoqi Jian, Guangxiang Zhao, Change Jia, Linglin Zhang, Sai-er Hu, Yuhan Wu, Xiangzheng Zhang•Jun 5, 2025•113

FlexPainter:靈活且多視角一致的紋理生成
FlexPainter: Flexible and Multi-View Consistent Texture Generation

Dongyu Yan, Leyi Wu, Jiantao Lin, Luozhou Wang, Tianshuo Xu, Zhifei Chen, Zhen Yang, Lie Xu, Shunsi Zhang, Yingcong Chen•Jun 3, 2025•112

固定文本編碼器的語言-圖像對齊
Language-Image Alignment with Fixed Text Encoders

Jingfeng Yang, Ziyang Wu, Yue Zhao, Yi Ma•Jun 4, 2025•106

基於詞彙偏置的自回歸圖像水印技術:一種抗再生攻擊的方法
Autoregressive Images Watermarking through Lexical Biasing: An Approach Resistant to Regeneration Attack

Siqi Hui, Yiren Song, Sanping Zhou, Ye Deng, Wenli Huang, Jinjun Wang•Jun 1, 2025•82

FreeTimeGS:動態場景重建中的隨時隨地自由高斯分佈
FreeTimeGS: Free Gaussians at Anytime and Anywhere for Dynamic Scene Reconstruction

Yifan Wang, Peishan Yang, Zhen Xu, Jiaming Sun, Zhanhua Zhang, Yong Chen, Hujun Bao, Sida Peng, Xiaowei Zhou•Jun 5, 2025•51

SkyReels-Audio:全方位音频调节的视频对话肖像 扩散变换器
SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers

Zhengcong Fei, Hao Jiang, Di Qiu, Baoxuan Gu, Youqiang Zhang, Jiahua Wang, Jialin Bai, Debang Li, Mingyuan Fan, Guibin Chen, Yahui Zhou•Jun 1, 2025•52

幾何可編輯且外觀保持的物體合成
Geometry-Editable and Appearance-Preserving Object Compositon

Jianman Lin, Haojie Li, Chunmei Qing, Zhijing Yang, Liang Lin, Tianshui Chen•May 27, 2025•52

動力學:重新思考測試階段的尺度定律
Kinetics: Rethinking Test-Time Scaling Laws

Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen•Jun 5, 2025•41

开放基础语言视觉模型与数据集的稳健比较之尺度定律
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

Marianna Nezhurina, Tomer Porian, Giovanni Pucceti, Tommie Kerssies, Romain Beaumont, Mehdi Cherti, Jenia Jitsev•Jun 5, 2025•41

MedAgentGym:大規模訓練基於代碼的醫療推理LLM代理
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Ran Xu, Yuchen Zhuang, Yishan Zhong, Yue Yu, Xiangru Tang, Hang Wu, May D. Wang, Peifeng Ruan, Donghan Yang, Tao Wang, Guanghua Xiao, Carl Yang, Yang Xie, Wenqi Shi•Jun 4, 2025•41

通过推理与强化学习实现大语言模型的情境完整性
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim•May 29, 2025•41

校正點流:通用點雲姿態估計
Rectified Point Flow: Generic Point Cloud Pose Estimation

Tao Sun, Liyuan Zhu, Shengyu Huang, Shuran Song, Iro Armeni•Jun 5, 2025•32

微行动:通过可执行的自我推理缓解问答中的知识冲突
Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning

Nan Huo, Jinyang Li, Bowen Qin, Ge Qu, Xiaolong Li, Xiaodong Li, Chenhao Ma, Reynold Cheng•Jun 5, 2025•31

FEAT:面向医学视频生成的全维度高效注意力Transformer
FEAT: Full-Dimensional Efficient Attention Transformer for Medical Video Generation

Huihan Wang, Zhiwen Yang, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu•Jun 5, 2025•31

圖像值得變長度的表徵
Images are Worth Variable Length of Representations

Lingjun Mao, Rodolfo Corona, Xin Liang, Wenhao Yan, Zineng Tang•Jun 4, 2025•32

RobustSplat:解耦密度化與動態處理,實現無瞬態干擾的3D高斯散射
RobustSplat: Decoupling Densification and Dynamics for Transient-Free 3DGS

Chuanyu Fu, Yuqi Zhang, Kunbin Yao, Guanying Chen, Yuan Xiong, Chuan Huang, Shuguang Cui, Xiaochun Cao•Jun 3, 2025•32

大理石:CLIP空间中的材料重组与混合
MARBLE: Material Recomposition and Blending in CLIP-Space

Ta-Ying Cheng, Prafull Sharma, Mark Boss, Varun Jampani•Jun 5, 2025•21

FlowDirector:無需訓練的流動導向技術,實現精確文本到視頻編輯
FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

Guangzhao Li, Yanming Yang, Chenxi Song, Chi Zhang•Jun 5, 2025•20

感知解耦:基於獎勵優化字幕實現可擴展的多模態推理
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Xin Jin, Zhenguo Li, James T. Kwok, Yu Zhang•Jun 5, 2025•21

BEVCALIB:基於幾何引導鳥瞰圖表示的LiDAR-相機校準
BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations

Weiduo Yuan, Jerry Li, Justin Yue, Divyank Shah, Konstantinos Karydis, Hang Qiu•Jun 3, 2025•22

基於擴散模型的生成式方法在自動駕駛中的三維佔用預測
Diffusion-Based Generative Models for 3D Occupancy Prediction in Autonomous Driving

Yunshen Wang, Yicheng Liu, Tianyuan Yuan, Yucheng Mao, Yingshi Liang, Xiuyu Yang, Honggang Zhang, Hang Zhao•May 29, 2025•22

PATS:面向多视角运动技能评估的熟练度感知时序采样
PATS: Proficiency-Aware Temporal Sampling for Multi-View Sports Skill Assessment

Edoardo Bianchi, Antonio Liotta•Jun 5, 2025•11

浮水印技術降低語言模型對齊性:分析與緩解策略
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation

Apurv Verma, NhatHai Phan, Shubhendu Trivedi•Jun 4, 2025•11

重新思考全身CT影像解讀:以異常為中心的策略
Rethinking Whole-Body CT Image Interpretation: An Abnormality-Centric Approach

Ziheng Zhao, Lisong Dai, Ya Zhang, Yanfeng Wang, Weidi Xie•Jun 3, 2025•12

SViMo:手物交互場景中的視頻與動作同步擴散生成
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Lingwei Dang, Ruizhi Shao, Hongwen Zhang, Wei Min, Yebin Liu, Qingyao Wu•Jun 3, 2025•13

自监督语音模型对荷兰语了解多少?分析语言特定预训练的优势
What do self-supervised speech models know about Dutch? Analyzing advantages of language-specific pre-training

Marianne de Heer Kloots, Hosein Mohebbi, Charlotte Pouw, Gaofei Shen, Willem Zuidema, Martijn Bentum•Jun 1, 2025•12