C-RADIOv4（技术报告）

摘要

通过采用多教师蒸馏技术，聚合视觉骨干网络能够构建统一的学生模型，该模型不仅保留还提升了多位教师的独特能力。本技术报告介绍了C-RADIO模型家族的最新版本C-RADIOv4，该版本基于AM-RADIO/RADIOv2.5的设计架构，在保持相同计算复杂度的前提下显著提升了关键下游任务的性能。我们发布了-SO400M（4.12亿参数）和-H（6.31亿参数）两种模型变体，二者均采用更新的教师模型集合进行训练：SigLIP2、DINOv3和SAM3。除了在核心指标上的提升以及通过模仿SAM3获得的新能力外，C-RADIOv4模型家族进一步优化了任意分辨率支持功能，重新引入ViTDet选项以实现高分辨率下的极致效率提升，并配备了宽松的开源许可协议。

English

By leveraging multi-teacher distillation, agglomerative vision backbones provide a unified student model that retains and improves the distinct capabilities of multiple teachers. In this tech report, we describe the most recent release of the C-RADIO family of models, C-RADIOv4, which builds upon AM-RADIO/RADIOv2.5 in design, offering strong improvements on key downstream tasks at the same computational complexity. We release -SO400M (412M params), and -H (631M) model variants, both trained with an updated set of teachers: SigLIP2, DINOv3, and SAM3. In addition to improvements on core metrics and new capabilities from imitating SAM3, the C-RADIOv4 model family further improves any-resolution support, brings back the ViTDet option for drastically enhanced efficiency at high-resolution, and comes with a permissive license.