JanusCoder: コード知能のための基盤的視覚-プログラム的インターフェースに向けて

要旨

ニューラルコードインテリジェンスの範囲は、テキストベースのソースコードを超えて、プログラムが生成する豊富な視覚的出力を含むように急速に拡大している。この視覚的側面は、柔軟なコンテンツ生成や視覚化のプログラム駆動による精密編集といった高度な応用において極めて重要である。しかし、高品質なマルチモーダルコードデータの不足が進展を妨げており、これは合成と品質評価における課題に起因するボトルネックとなっている。これらの課題に対処するため、我々はデータとモデリングの両面から貢献を行う。まず、データモダリティ間の相互シナジーを活用して、標準的なチャートから複雑なインタラクティブなWeb UI、コード駆動アニメーションまでを含む大規模で高品質なコーパスを効率的に生成する統合合成ツールキットを導入する。このツールキットを活用し、我々は現在までで最大のマルチモーダルコードコーパスであるJanusCode-800Kを構築した。これを基盤として、テキスト指示、視覚的入力、またはその両方の組み合わせからコードを生成する視覚的-プログラム的インターフェースを確立するモデル、JanusCoderおよびJanusCoderVを訓練する。我々の統一モデルは、分離されたタスクごとに特化したモデルを構築する既存のアプローチからの転換点である。テキスト中心およびビジョン中心のコーディングタスクにおける広範な実験により、JanusCoderシリーズの優れた性能が実証され、7Bから14Bスケールの我々のモデルは商用モデルの性能に迫り、場合によっては凌駕することさえ示された。さらに、広範な分析を通じて、プログラム的論理とその視覚的表現を調和させるための重要な知見が得られた。コードとチェックポイントはhttps://github.com/InternLM/JanusCoderで公開されている。

English

The scope of neural code intelligence is rapidly expanding beyond text-based source code to encompass the rich visual outputs that programs generate. This visual dimension is critical for advanced applications like flexible content generation and precise, program-driven editing of visualizations. However, progress has been impeded by the scarcity of high-quality multimodal code data, a bottleneck stemming from challenges in synthesis and quality assessment. To address these challenges, we make contributions from both a data and modeling perspective. We first introduce a complete synthesis toolkit that leverages reciprocal synergies between data modalities to efficiently produce a large-scale, high-quality corpus spanning from standard charts to complex interactive web UIs and code-driven animations. Leveraging this toolkit, we construct JanusCode-800K, the largest multimodal code corpus to date. This powers the training of our models, JanusCoder and JanusCoderV, which establish a visual-programmatic interface for generating code from textual instructions, visual inputs, or a combination of both. Our unified model is a departure from existing approaches that build specialized models for isolated tasks. Extensive experiments on both text-centric and vision-centric coding tasks demonstrate the superior performance of the JanusCoder series, with our 7B to 14B scale models approaching or even exceeding the performance of commercial models. Furthermore, extensive analysis provides key insights into harmonizing programmatic logic with its visual expression. Our code and checkpoints will are available at https://github.com/InternLM/JanusCoder.

JanusCoder: コード知能のための基盤的視覚-プログラム的インターフェースに向けて

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

要旨

Support