JanusCoder：迈向代码智能的基础视觉-编程界面

摘要

神经代码智能的研究范畴正迅速超越基于文本的源代码，延伸至程序生成的丰富视觉输出。这一视觉维度对于灵活内容生成和精确的程序驱动可视化编辑等高级应用至关重要。然而，高质量多模态代码数据的稀缺阻碍了研究进展，这一瓶颈源于合成与质量评估的双重挑战。为应对这些挑战，我们从数据和建模两个维度做出贡献：首先提出完整的合成工具包，利用数据模态间的协同效应高效构建涵盖标准图表、复杂交互式网页界面及代码驱动动画的大规模高质量语料库。基于此工具包，我们构建了迄今最大的多模态代码语料库JanusCode-800K，并以此训练出JanusCoder与JanusCoderV系列模型，建立起支持文本指令、视觉输入或其组合生成代码的视觉-编程接口。我们的统一模型突破了现有构建孤立任务专用模型的范式，在文本主导和视觉主导的编码任务上的大量实验表明，JanusCoder系列在7B至14B参数规模下可逼近甚至超越商业模型性能。进一步的分析为协调程序逻辑与视觉表达提供了关键见解。代码与模型权重已开源于https://github.com/InternLM/JanusCoder。

English

The scope of neural code intelligence is rapidly expanding beyond text-based source code to encompass the rich visual outputs that programs generate. This visual dimension is critical for advanced applications like flexible content generation and precise, program-driven editing of visualizations. However, progress has been impeded by the scarcity of high-quality multimodal code data, a bottleneck stemming from challenges in synthesis and quality assessment. To address these challenges, we make contributions from both a data and modeling perspective. We first introduce a complete synthesis toolkit that leverages reciprocal synergies between data modalities to efficiently produce a large-scale, high-quality corpus spanning from standard charts to complex interactive web UIs and code-driven animations. Leveraging this toolkit, we construct JanusCode-800K, the largest multimodal code corpus to date. This powers the training of our models, JanusCoder and JanusCoderV, which establish a visual-programmatic interface for generating code from textual instructions, visual inputs, or a combination of both. Our unified model is a departure from existing approaches that build specialized models for isolated tasks. Extensive experiments on both text-centric and vision-centric coding tasks demonstrate the superior performance of the JanusCoder series, with our 7B to 14B scale models approaching or even exceeding the performance of commercial models. Furthermore, extensive analysis provides key insights into harmonizing programmatic logic with its visual expression. Our code and checkpoints will are available at https://github.com/InternLM/JanusCoder.

JanusCoder：迈向代码智能的基础视觉-编程界面

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

摘要

Support