JanusCoder: 코드 인텔리전스를 위한 기초 시각-프로그래밍 인터페이스 구축

초록

신경망 코드 인텔리전스의 범위는 텍스트 기반 소스 코드를 넘어 프로그램이 생성하는 풍부한 시각적 출력을 포괄하며 빠르게 확장되고 있습니다. 이러한 시각적 차원은 유연한 콘텐츠 생성 및 시각화 자료의 정밀한 프로그램 주도 편집과 같은 고급 애플리케이션에 매우 중요합니다. 그러나 합성 및 품질 평가의 어려움에서 비롯된 고품질 멀티모달 코드 데이터의 부족으로 인해 발전이 저해되어 왔습니다. 이러한 과제를 해결하기 위해 우리는 데이터와 모델링 관점에서 기여합니다. 먼저, 데이터 모달리티 간의 상호 시너지 효과를 활용하여 표준 차트부터 복잡한 대화형 웹 UI 및 코드 기반 애니메이션에 이르는 대규모 고품질 코퍼스를 효율적으로 생성하는 완전한 합성 툴킷을 소개합니다. 이 툴킷을 활용하여 우리는 현재까지 가장 큰 멀티모달 코드 코퍼스인 JanusCode-800K를 구축했습니다. 이를 통해 텍스트 지시, 시각적 입력 또는 둘의 조합으로부터 코드를 생성하기 위한 시각-프로그래밍 인터페이스를 구축하는 JanusCoder 및 JanusCoderV 모델을 훈련시킵니다. 우리의 통합 모델은 분리된 작업에 대해 특화된 모델을 구축하는 기존 접근 방식과는 차별화됩니다. 텍스트 중심 및 비전 중심 코딩 작업에 대한 광범위한 실험을 통해 JanusCoder 시리즈의 우수한 성능을 입증하였으며, 7B에서 14B 규모의 우리 모델들은 상용 모델의 성능에 근접하거나 이를 능가합니다. 더 나아가, 광범위한 분석을 통해 프로그래밍 논리와 시각적 표현을 조화시키는 데 대한 핵심 통찰력을 제공합니다. 우리의 코드와 체크포인트는 https://github.com/InternLM/JanusCoder에서 이용할 수 있습니다.

English

The scope of neural code intelligence is rapidly expanding beyond text-based source code to encompass the rich visual outputs that programs generate. This visual dimension is critical for advanced applications like flexible content generation and precise, program-driven editing of visualizations. However, progress has been impeded by the scarcity of high-quality multimodal code data, a bottleneck stemming from challenges in synthesis and quality assessment. To address these challenges, we make contributions from both a data and modeling perspective. We first introduce a complete synthesis toolkit that leverages reciprocal synergies between data modalities to efficiently produce a large-scale, high-quality corpus spanning from standard charts to complex interactive web UIs and code-driven animations. Leveraging this toolkit, we construct JanusCode-800K, the largest multimodal code corpus to date. This powers the training of our models, JanusCoder and JanusCoderV, which establish a visual-programmatic interface for generating code from textual instructions, visual inputs, or a combination of both. Our unified model is a departure from existing approaches that build specialized models for isolated tasks. Extensive experiments on both text-centric and vision-centric coding tasks demonstrate the superior performance of the JanusCoder series, with our 7B to 14B scale models approaching or even exceeding the performance of commercial models. Furthermore, extensive analysis provides key insights into harmonizing programmatic logic with its visual expression. Our code and checkpoints will are available at https://github.com/InternLM/JanusCoder.

JanusCoder: 코드 인텔리전스를 위한 기초 시각-프로그래밍 인터페이스 구축

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

초록

Support