ChatPaper.ai
打開菜單
首頁
每日論文
arXiv
HuggingFace
定價
賬戶
工作台
🇭🇰
繁體中文
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究論文每日精選
每日精選AI研究論文及翻譯
March 26th, 2025
基於下一幀預測的長上下文自回歸視頻建模
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu, Weijia Mao, Mike Zheng Shou
•
Mar 25, 2025
•
72
2
將視覺預訓練擴展至4K解析度
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi, Boyi Li, Han Cai, Yao Lu, Sifei Liu, Marco Pavone, Jan Kautz, Song Han, Trevor Darrell, Pavlo Molchanov, Hongxu Yin
•
Mar 25, 2025
•
40
2
流模型在推理時期的規模化:透過隨機生成與滾動預算強制
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim, Taehoon Yoon, Jisung Hwang, Minhyuk Sung
•
Mar 25, 2025
•
33
4
探索大型多模態模型在視頻理解中的幻覺現象:基準、分析與緩解策略
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao, Jiashu Qu, Jingyi Tang, Baolong Bi, Yue Liu, Hongyu Chen, Li Liang, Li Su, Qingming Huang
•
Mar 25, 2025
•
31
4
CoMP:面向視覺基礎模型的持續多模態預訓練
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yitong Chen, Lingchen Meng, Wujian Peng, Zuxuan Wu, Yu-Gang Jiang
•
Mar 24, 2025
•
30
1
三思而後行:通過擴展多輪測試時思考來增強大型語言模型的推理能力
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
Xiaoyu Tian, Sitong Zhao, Haotian Wang, Shuaiting Chen, Yunjie Ji, Yiping Peng, Han Zhao, Xiangang Li
•
Mar 25, 2025
•
26
5
識別偽造:基於大型多模態模型的合成影像檢測與偽影解析
Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation
Siwei Wen, Junyan Ye, Peilin Feng, Hengrui Kang, Zichen Wen, Yize Chen, Jiang Wu, Wenjun Wu, Conghui He, Weijia Li
•
Mar 19, 2025
•
20
3
MDocAgent:一個多模態多代理框架用於文件理解
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding
Siwei Han, Peng Xia, Ruiyi Zhang, Tong Sun, Yun Li, Hongtu Zhu, Huaxiu Yao
•
Mar 18, 2025
•
19
2
ReSearch:透過強化學習讓大型語言模型學會基於搜索的推理
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Mingyang Chen, Tianpeng Li, Haoze Sun, Yijie Zhou, Chenzheng Zhu, Fan Yang, Zenan Zhou, Weipeng Chen, Haofen Wang, Jeff Z. Pan, Wen Zhang, Huajun Chen
•
Mar 25, 2025
•
17
3
CoLLM:面向组合圖像檢索的大型語言模型
CoLLM: A Large Language Model for Composed Image Retrieval
Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava
•
Mar 25, 2025
•
14
2
WikiAutoGen:邁向多模態維基百科風格文章生成
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang, Jun Chen, Dannong Xu, Junjie Fei, Xiaoqian Shen, Liangbing Zhao, Chun-Mei Feng, Mohamed Elhoseiny
•
Mar 24, 2025
•
11
2
潛在空間超解析度:基於擴散模型的高解析度影像生成
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Jinho Jeong, Sangmin Han, Jinwoo Kim, Seon Joo Kim
•
Mar 24, 2025
•
10
1
FullDiT:具備全注意力機制的多任務視頻生成基礎模型
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention
Xuan Ju, Weicai Ye, Quande Liu, Qiulin Wang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Qiang Xu
•
Mar 25, 2025
•
8
2
DiffPortrait360:用於360度視角合成的統一肖像擴散模型
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li
•
Mar 19, 2025
•
8
2
FirePlace:基於幾何精煉的LLM常識推理於3D物件擺放
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
Ian Huang, Yanan Bao, Karen Truong, Howard Zhou, Cordelia Schmid, Leonidas Guibas, Alireza Fathi
•
Mar 6, 2025
•
8
2
PhysTwin:基於物理約束的變形物體視頻重建與模擬
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos
Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, Yunzhu Li
•
Mar 23, 2025
•
7
2
前瞻調校:通過部分答案預覽打造更安全的語言模型
LookAhead Tuning: Safer Language Models via Partial Answer Previews
Kangwei Liu, Mengru Wang, Yujie Luo, Lin Yuan, Mengshu Sun, Ningyu Zhang, Lei Liang, Zhiqiang Zhang, Jun Zhou, Huajun Chen
•
Mar 24, 2025
•
5
3
透過微調遷移實現高效的模型開發
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu
•
Mar 25, 2025
•
4
2
FRESA:基於少量圖像的前饋式個人化骨骼動畫角色重建
FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh
•
Mar 24, 2025
•
4
2
xKV:面向KV缓存压缩的跨层奇异值分解
xKV: Cross-Layer SVD for KV-Cache Compression
Chi-Chih Chang, Chien-Yu Lin, Yash Akhauri, Wei-Cheng Lin, Kai-Chiang Wu, Luis Ceze, Mohamed S. Abdelfattah
•
Mar 24, 2025
•
4
1
基於直通式指導的Gumbel-Softmax流匹配用於可控生物序列生成
Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation
Sophia Tang, Yinuo Zhang, Alexander Tong, Pranam Chatterjee
•
Mar 21, 2025
•
4
2
強力基準線:基於YOLOv12與BoT-SORT-ReID的多無人機追蹤系統
Strong Baseline: Multi-UAV Tracking via YOLOv12 with BoT-SORT-ReID
Yu-Hsi Chen
•
Mar 21, 2025
•
4
5
當文字勝過視覺:視覺語言模型可通過純文本訓練自我提升,以實現以人為本的決策
When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making
Zhe Hu, Jing Li, Yu Yin
•
Mar 21, 2025
•
4
2
邁向統一的哥白尼地球視覺基礎模型
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang, Zhitong Xiong, Chenying Liu, Adam J. Stewart, Thomas Dujardin, Nikolaos Ioannis Bountos, Angelos Zavras, Franziska Gerken, Ioannis Papoutsis, Laura Leal-Taixé, Xiao Xiang Zhu
•
Mar 14, 2025
•
4
3
LLaVAction:評估與訓練多模態大型語言模型以實現動作識別
LLaVAction: evaluating and training multi-modal large language models for action recognition
Shaokai Ye, Haozhe Qi, Alexander Mathis, Mackenzie W. Mathis
•
Mar 24, 2025
•
3
2
Any6D:新型物件的無模型六維姿態估計
Any6D: Model-free 6D Pose Estimation of Novel Objects
Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon
•
Mar 24, 2025
•
3
2
OpenCity3D:視覺語言模型對城市環境了解多少?
OpenCity3D: What do Vision-Language Models know about Urban Environments?
Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann
•
Mar 21, 2025
•
3
2
視覺語言模型能否在現實世界中回答面對面的問題?
Can Vision-Language Models Answer Face to Face Questions in the Real-World?
Reza Pourreza, Rishit Dagli, Apratim Bhattacharyya, Sunny Panchal, Guillaume Berger, Roland Memisevic
•
Mar 25, 2025
•
2
2
克服詞彙不匹配:詞彙無關的教師指導語言建模
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling
Haebin Shin, Lei Ji, Xiao Liu, Yeyun Gong
•
Mar 24, 2025
•
2
2
頻率動態卷積用於密集圖像預測
Frequency Dynamic Convolution for Dense Image Prediction
Linwei Chen, Lin Gu, Liang Li, Chenggang Yan, Ying Fu
•
Mar 24, 2025
•
2
2
LPOSS:基於圖塊與像素的標籤傳播實現開放詞彙語義分割
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić, Yannis Kalantidis, Jiří Matas, Giorgos Tolias
•
Mar 25, 2025
•
1
2
ST-VLM:面向視覺語言模型中時空推理的運動學指令微調
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
Dohwan Ko, Sihyeon Kim, Yumin Suh, Vijay Kumar B. G, Minseo Yoon, Manmohan Chandraker, Hyunwoo J. Kim
•
Mar 25, 2025
•
1
1
Co-SemDepth:航拍圖像上的快速聯合語義分割與深度估計
Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images
Yara AlaaEldin, Francesca Odone
•
Mar 23, 2025
•
0
2