ChatPaper.ai
打開菜單
首頁
每日論文
arXiv
HuggingFace
定價
賬戶
工作台
🇭🇰
繁體中文
Loading...
•
•
•
•
•
•
•
•
•
•
AI研究論文每日精選
每日精選AI研究論文及翻譯
March 14th, 2025
R1-Onevision:通過跨模態形式化推進通用多模態推理
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Yi Yang, Xiaoxuan He, Hongkun Pan, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Dacheng Yin, Fengyun Rao, Minfeng Zhu, Bo Zhang, Wei Chen
•
Mar 13, 2025
•
17
3
基於隨機化平行解碼的自回歸圖像生成
Autoregressive Image Generation with Randomized Parallel Decoding
Haopeng Li, Jinyue Yang, Guoqi Li, Huan Wang
•
Mar 13, 2025
•
8
2
Open-Sora 2.0:以20萬美元成本訓練商業級視頻生成模型
Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k
Xiangyu Peng, Zangwei Zheng, Chenhui Shen, Tom Young, Xinying Guo, Binluo Wang, Hang Xu, Hongxin Liu, Mingyan Jiang, Wenjun Li, Yuhui Wang, Anbang Ye, Gang Ren, Qianran Ma, Wanying Liang, Xiang Lian, Xiwen Wu, Yuting Zhong, Zhuangyan Li, Chaoyu Gong, Guojun Lei, Leijun Cheng, Limin Zhang, Minghao Li, Ruijie Zhang, Silan Hu, Shijie Huang, Xiaokang Wang, Yuanheng Zhao, Yuqi Wang, Ziang Wei, Yang You
•
Mar 12, 2025
•
18
3
視覺語言模型在理解圖像變換上的局限性
On the Limitations of Vision-Language Models in Understanding Image Transforms
Ahmad Mustafa Anis, Hasnain Ali, Saquib Sarfraz
•
Mar 12, 2025
•
10
2
無歸一化的變換器
Transformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He, Yann LeCun, Zhuang Liu
•
Mar 13, 2025
•
161
5
PerCoV2:基于隐式分层掩码图像建模的改进型超低比特率感知图像压缩
PerCoV2: Improved Ultra-Low Bit-Rate Perceptual Image Compression with Implicit Hierarchical Masked Image Modeling
Nikolai Körber, Eduard Kromer, Andreas Siebert, Sascha Hauke, Daniel Mueller-Gritschneder, Björn Schuller
•
Mar 12, 2025
•
3
2
GroundingSuite:測量複雜多粒度像素定位
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
Rui Hu, Lianghui Zhu, Yuxuan Zhang, Tianheng Cheng, Lei Liu, Heng Liu, Longjin Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang
•
Mar 13, 2025
•
18
2
從分類器中心視角探討(無)分類器引導機制
Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective
Xiaoming Zhao, Alexander G. Schwing
•
Mar 13, 2025
•
2
2
OpenAI Whisper模型的量化:一項比較分析
Quantization for OpenAI's Whisper Models: A Comparative Analysis
Allison Andreyev
•
Mar 12, 2025
•
6
2
MinorBench:一個手動構建的針對兒童內容風險的基準測試
MinorBench: A hand-built benchmark for content-based risks for children
Shaun Khoo, Gabriel Chua, Rachel Shong
•
Mar 13, 2025
•
4
3
TruthPrInt:通過潛在真實性引導的預干預緩解LVLM物體幻覺
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu
•
Mar 13, 2025
•
4
2
世界建模成就更優規劃者:雙重偏好優化於具身任務規劃
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
Siyin Wang, Zhaoye Fei, Qinyuan Cheng, Shiduo Zhang, Panpan Cai, Jinlan Fu, Xipeng Qiu
•
Mar 13, 2025
•
53
7
OmniPaint:通過解耦的插入-移除修復技術掌握面向物件的編輯
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Yongsheng Yu, Ziyun Zeng, Haitian Zheng, Jiebo Luo
•
Mar 11, 2025
•
29
2
VisualWebInstruct:透過網路搜尋擴展多模態指令數據
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia, Jiachen Li, Xiang Yue, Bo Li, Ping Nie, Kai Zou, Wenhu Chen
•
Mar 13, 2025
•
23
2
通訊高效的語言模型訓練展現出可靠且穩健的擴展性:DiLoCo的擴展法則
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery, Keith Rush, Nova Fallen, Zachary Garrett, Arthur Szlam, Arthur Douillard
•
Mar 12, 2025
•
14
2
將長上下文LLM研究重心從輸入轉向輸出
Shifting Long-Context LLMs Research from Input to Output
Yuhao Wu, Yushi Bai, Zhiqing Hu, Shangqing Tu, Ming Shan Hee, Juanzi Li, Roy Ka-Wei Lee
•
Mar 6, 2025
•
22
2
無聲品牌攻擊:針對文本到圖像擴散模型的無觸發數據投毒攻擊
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang, June Suk Choi, Jaehyeong Jo, Kimin Lee, Sung Ju Hwang
•
Mar 12, 2025
•
36
2
條件之困:分析並改進基於條件流的最優傳輸生成方法
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation
Ho Kei Cheng, Alexander Schwing
•
Mar 13, 2025
•
3
2
SANA-Sprint:一步扩散与连续时间一致性蒸馏
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Enze Xie, Song Han
•
Mar 12, 2025
•
37
4
CoSTAast:面向多輪圖像編輯的成本敏感型工具路徑代理
CoSTAast: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
Advait Gupta, NandaKiran Velaga, Dang Nguyen, Tianyi Zhou
•
Mar 13, 2025
•
79
10
DiT-Air:重新審視擴散模型架構在文本到圖像生成中的效率設計
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
Chen Chen, Rui Qian, Wenze Hu, Tsu-Jui Fu, Lezhi Li, Bowen Zhang, Alex Schwing, Wei Liu, Yinfei Yang
•
Mar 13, 2025
•
17
2
Light-R1:從零開始及超越的長鏈思維課程式監督微調、直接偏好優化與強化學習
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond
Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang
•
Mar 13, 2025
•
28
4
VisualPRM:一種適用於多模態推理的高效過程獎勵模型
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang, Zhangwei Gao, Lianjie Chen, Zhe Chen, Jinguo Zhu, Xiangyu Zhao, Yangzhou Liu, Yue Cao, Shenglong Ye, Xizhou Zhu, Lewei Lu, Haodong Duan, Yu Qiao, Jifeng Dai, Wenhai Wang
•
Mar 13, 2025
•
36
3
CoRe^2:收集、反思與精煉,以更優更快地生成
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster
Shitong Shao, Zikai Zhou, Dian Xie, Yuetong Fang, Tian Ye, Lichen Bai, Zeke Xie
•
Mar 12, 2025
•
34
4
GoT:釋放多模態大型語言模型的推理能力,實現視覺生成與編輯
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Rongyao Fang, Chengqi Duan, Kun Wang, Linjiang Huang, Hao Li, Shilin Yan, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, Xihui Liu, Hongsheng Li
•
Mar 13, 2025
•
50
2
在擴散模型中提煉多樣性與控制性
Distilling Diversity and Control in Diffusion Models
Rohit Gandikota, David Bau
•
Mar 13, 2025
•
14
2
CINEMA:基於多模態大語言模型引導的連貫多主體視頻生成
CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance
Yufan Deng, Xun Guo, Yizhi Wang, Jacob Zhiyuan Fang, Angtian Wang, Shenghai Yuan, Yiding Yang, Bo Liu, Haibin Huang, Chongyang Ma
•
Mar 13, 2025
•
11
2
探索視覺變壓器中影響力神經元路徑
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi, Changming Li, Anqi Pang, Sibei Yang, Jingyi Yu, Kan Ren
•
Mar 12, 2025
•
6
2
UniGoal:邁向通用零樣本目標導向導航
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Hang Yin, Xiuwei Xu, Lingqing Zhao, Ziwei Wang, Jie Zhou, Jiwen Lu
•
Mar 13, 2025
•
6
2
拼合概念:基于部分的IP先验概念生成
Piece it Together: Part-Based Concepting with IP-Priors
Elad Richardson, Kfir Goldberg, Yuval Alaluf, Daniel Cohen-Or
•
Mar 13, 2025
•
8
2
ConsisLoRA:提升基於LoRA風格遷移的內容與風格一致性
ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer
Bolin Chen, Baoquan Zhao, Haoran Xie, Yi Cai, Qing Li, Xudong Mao
•
Mar 13, 2025
•
8
2
長上下文調適用於視頻生成
Long Context Tuning for Video Generation
Yuwei Guo, Ceyuan Yang, Ziyan Yang, Zhibei Ma, Zhijie Lin, Zhenheng Yang, Dahua Lin, Lu Jiang
•
Mar 13, 2025
•
14
2
PoseLess:基於視覺語言模型直接圖像映射的無深度視覺到關節控制
PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM
Alan Dao, Dinh Bach Vu, Tuan Le Duc Anh, Bui Quang Huy
•
Mar 10, 2025
•
3
2
「沉默並非真正的沉默」:漏洞報告討論中的毒性調查
"Silent Is Not Actually Silent": An Investigation of Toxicity on Bug Report Discussion
Mia Mohammad Imran, Jaydeb Sarker
•
Mar 13, 2025
•
4
2
一個令人沮喪卻極為有效的攻擊基準:對抗GPT-4.5/4o/o1等強大黑箱模型,成功率超過90%
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
Zhaoyi Li, Xiaohan Zhao, Dong-Dong Wu, Jiacheng Cui, Zhiqiang Shen
•
Mar 13, 2025
•
3
2
4D LangSplat:基於多模態大型語言模型的四維語言高斯潑濺
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li, Renping Zhou, Jiawei Zhou, Yingwei Song, Johannes Herter, Minghan Qin, Gao Huang, Hanspeter Pfister
•
Mar 13, 2025
•
32
2
探索與導航Hugging Face的模型圖譜
Charting and Navigating Hugging Face's Model Atlas
Eliahu Horwitz, Nitzan Kurer, Jonathan Kahana, Liel Amar, Yedid Hoshen
•
Mar 13, 2025
•
79
6
我看起來像是一隻「貓.n.01」嗎?一個分類學圖像生成基準
Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark
Viktor Moskvoretskii, Alina Lobanova, Ekaterina Neminova, Chris Biemann, Alexander Panchenko, Irina Nikishina
•
Mar 13, 2025
•
11
2
現代機器翻譯的新趨勢:基於大型推理模型
New Trends for Modern Machine Translation with Large Reasoning Models
Sinuo Liu, Chenyang Lyu, Minghao Wu, Longyue Wang, Weihua Luo, Kaifu Zhang
•
Mar 13, 2025
•
23
2