ChatPaper.ai
打開菜單
首頁
每日論文
arXiv
HuggingFace
定價
賬戶
工作台
🇭🇰
繁體中文
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究論文每日精選
每日精選AI研究論文及翻譯
May 15th, 2025
BLIP3-o:一個完全開放的統一多模態模型家族——架構、訓練與數據集
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
Jiuhai Chen, Zhiyang Xu, Xichen Pan, Yushi Hu, Can Qin, Tom Goldstein, Lifu Huang, Tianyi Zhou, Saining Xie, Silvio Savarese, Le Xue, Caiming Xiong, Ran Xu
•
May 14, 2025
•
40
1
DeCLIP:面向开放词汇密集感知的解耦学习
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang, Bin Chen, Yulin Li, Bin Kang, Yichi Chen, Zhuotao Tian
•
May 7, 2025
•
35
1
深入探討DeepSeek-V3:AI架構中的擴展挑戰與硬體反思
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao, Chengqi Deng, Chong Ruan, Damai Dai, Huazuo Gao, Jiashi Li, Liyue Zhang, Panpan Huang, Shangyan Zhou, Shirong Ma, Wenfeng Liang, Ying He, Yuqing Wang, Yuxuan Liu, Y. X. Wei
•
May 14, 2025
•
22
1
Marigold:基於擴散模型的圖像生成器在圖像分析中的經濟高效適應
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, Konrad Schindler
•
May 14, 2025
•
13
1
UniSkill:透過跨體現技能表徵模仿人類影片
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
Hanjung Kim, Jaehyun Kang, Hyolim Kang, Meedeum Cho, Seon Joo Kim, Youngwoon Lee
•
May 13, 2025
•
12
1
SweRank:基於代碼排序的軟件問題定位
SweRank: Software Issue Localization with Code Ranking
Revanth Gangi Reddy, Tarun Suresh, JaeHyeok Doo, Ye Liu, Xuan Phi Nguyen, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Heng Ji, Shafiq Joty
•
May 7, 2025
•
6
1
CAST:基於RGB影像的組件對齊三維場景重建
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, Jingyi Yu
•
Feb 18, 2025
•
5
2
WavReward:具備通用獎勵評估功能的語音對話模型
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji, Tianle Liang, Yangzhuo Li, Jialong Zuo, Minghui Fang, Jinzheng He, Yifu Chen, Zhengqing Liu, Ziyue Jiang, Xize Cheng, Siqi Zheng, Jin Xu, Junyang Lin, Zhou Zhao
•
May 14, 2025
•
4
2
Omni-R1:你真的需要音頻來微調你的音頻大語言模型嗎?
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko, Saurabhchand Bhati, Edson Araujo, Samuel Thomas, Hilde Kuehne, Rogerio Feris, James Glass
•
May 14, 2025
•
4
1
VCRBench:探索大型視頻語言模型的長篇因果推理能力
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models
Pritam Sarkar, Ali Etemad
•
May 13, 2025
•
4
1
DetReIDX:面向現實世界無人機人員識別的壓力測試數據集
DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition
Kailash A. Hambarde, Nzakiese Mbongo, Pavan Kumar MP, Satish Mekewad, Carolina Fernandes, Gökhan Silahtaroğlu, Alice Nithya, Pawan Wasnik, MD. Rashidunnabi, Pranita Samale, Hugo Proença
•
May 7, 2025
•
2
1
視覺可解釋的子任務推理於視覺問答中的應用
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng, Arushi Goel, Hakan Bilen
•
May 12, 2025
•
1
1
LightLab:利用擴散模型控制圖像中的光源
LightLab: Controlling Light Sources in Images with Diffusion Models
Nadav Magar, Amir Hertz, Eric Tabellion, Yael Pritch, Alex Rav-Acha, Ariel Shamir, Yedid Hoshen
•
May 14, 2025
•
0
1
在Maya背後:構建多語言視覺語言模型
Behind Maya: Building a Multilingual Vision Language Model
Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji
•
May 13, 2025
•
0
1
理解與緩解圖像-文本預訓練數據集中的毒性問題:以LLaVA為例
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA
Karthik Reddy Kanjula, Surya Guthikonda, Nahid Alam, Shayekh Bin Islam
•
May 9, 2025
•
0
1
最陡下降密度控制用於緊湊型3D高斯潑濺
Steepest Descent Density Control for Compact 3D Gaussian Splatting
Peihao Wang, Yuehao Wang, Dilin Wang, Sreyas Mohan, Zhiwen Fan, Lemeng Wu, Ruisi Cai, Yu-Ying Yeh, Zhangyang Wang, Qiang Liu, Rakesh Ranjan
•
May 8, 2025
•
0
1