ChatPaper.aiChatPaper

SkinFlow:基于动态视觉编码与分级强化学习的开放性皮肤病诊断高效信息传输方案

SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

January 14, 2026
作者: Lijun Liu, Linwei Chen, Zhishou Zhang, Meng Tian, Hengfu Cui, Ruiyang Li, Zhaocheng Liu, Qiang Ju, Qianxi Li, Hong-Yu Zhou
cs.AI

摘要

通用大规模视觉语言模型(LVLM)尽管参数量庞大,但在皮肤病学领域往往表现不佳,其根源在于"注意力弥散"现象——即难以从背景噪声中分离出细微的病理特征。本文挑战了"参数缩放是提升医学精度的唯一途径"这一固有认知,提出SkinFlow框架,将诊断任务重构为视觉信息传输效率的优化问题。该框架采用虚拟宽度动态视觉编码器(DVE),在不增加实体参数的前提下实现对复杂病理流形的"展开"解析,并结合两阶段强化学习策略:第一阶段对齐显性医学描述,第二阶段在受限语义空间内重建隐性诊断纹理。此外,我们设计了基于临床实践的评价体系,重点关注诊断安全性与层级化关联度,而非僵化的标签匹配。实证结果显著:我们的70亿参数模型在Fitzpatrick17k基准测试中刷新纪录,Top-1准确率较巨型通用模型(如Qwen3VL-235B和GPT-5.2)提升12.06%,Top-6准确率跃升28.57%。这表明通过优化几何容量与信息流,能比单纯参数缩放产生更卓越的诊断推理能力。
English
General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to "diffuse attention" - the inability to disentangle subtle pathological lesions from background noise. In this paper, we challenge the assumption that parameter scaling is the only path to medical precision. We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency. Our approach utilizes a Virtual-Width Dynamic Vision Encoder (DVE) to "unfold" complex pathological manifolds without physical parameter expansion, coupled with a two-stage Reinforcement Learning strategy. This strategy sequentially aligns explicit medical descriptions (Stage I) and reconstructs implicit diagnostic textures (Stage II) within a constrained semantic space. Furthermore, we propose a clinically grounded evaluation protocol that prioritizes diagnostic safety and hierarchical relevance over rigid label matching. Empirical results are compelling: our 7B model establishes a new state-of-the-art on the Fitzpatrick17k benchmark, achieving a +12.06% gain in Top-1 accuracy and a +28.57% boost in Top-6 accuracy over the massive general-purpose models (e.g., Qwen3VL-235B and GPT-5.2). These findings demonstrate that optimizing geometric capacity and information flow yields superior diagnostic reasoning compared to raw parameter scaling.
PDF364January 16, 2026