JanusCoder：邁向程式碼智能的基礎視覺-程式化介面

摘要

神經程式智慧的研究範疇正迅速超越文字型原始碼，延伸至程式所生成的豐富視覺輸出。這種視覺維度對於進階應用至關重要，例如靈活的內容生成與精準的程式驅動視覺化編輯。然而，高品質多模態程式資料的匱乏阻礙了進展，此瓶頸源自於合成與品質評估方面的挑戰。為應對這些挑戰，我們從資料與建模雙重視角提出貢獻：首先推出完整的合成工具包，利用資料模態間的互補協同效應，高效產出從標準圖表到複雜互動式網頁介面及程式驅動動畫的大規模高品質語料庫。基於此工具，我們建構了迄今最大的多模態程式語料庫JanusCode-800K，並以此訓練JanusCoder與JanusCoderV模型，建立可從文字指令、視覺輸入或兩者結合生成程式碼的視覺-程式介面。我們的統一模型有別於現行針對孤立任務構建專用模型的方法，在文字導向與視覺導向的程式任務上的大量實驗表明，JanusCoder系列在7B至14B規模模型中展現卓越性能，甚至逼近或超越商業模型表現。此外，深入分析為協調程式邏輯與視覺表達提供了關鍵見解。我們的程式碼與檢查點已開源於：https://github.com/InternLM/JanusCoder。

English

The scope of neural code intelligence is rapidly expanding beyond text-based source code to encompass the rich visual outputs that programs generate. This visual dimension is critical for advanced applications like flexible content generation and precise, program-driven editing of visualizations. However, progress has been impeded by the scarcity of high-quality multimodal code data, a bottleneck stemming from challenges in synthesis and quality assessment. To address these challenges, we make contributions from both a data and modeling perspective. We first introduce a complete synthesis toolkit that leverages reciprocal synergies between data modalities to efficiently produce a large-scale, high-quality corpus spanning from standard charts to complex interactive web UIs and code-driven animations. Leveraging this toolkit, we construct JanusCode-800K, the largest multimodal code corpus to date. This powers the training of our models, JanusCoder and JanusCoderV, which establish a visual-programmatic interface for generating code from textual instructions, visual inputs, or a combination of both. Our unified model is a departure from existing approaches that build specialized models for isolated tasks. Extensive experiments on both text-centric and vision-centric coding tasks demonstrate the superior performance of the JanusCoder series, with our 7B to 14B scale models approaching or even exceeding the performance of commercial models. Furthermore, extensive analysis provides key insights into harmonizing programmatic logic with its visual expression. Our code and checkpoints will are available at https://github.com/InternLM/JanusCoder.

JanusCoder：邁向程式碼智能的基礎視覺-程式化介面

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

摘要

Support