Crafter: 다양한 입력으로부터 편집 가능한 과학적 그림 생성을 위한 다중 에이전트 하네스

초록

과학적 그림은 복잡한 연구 아이디어를 전달하는 가장 효과적인 수단 중 하나이지만, 출판 수준의 일러스트레이션을 제작하는 것은 여전히 논문 준비에서 가장 노동 집약적인 부분 중 하나이다. 기존의 자동화 시스템들은 각각 텍스트 입력만을 사용하여 단일 그림 유형을 대상으로 하므로, 연구자들이 실제로 사용하는 다양한 유형과 조건을 다루지 못하며, 그 래스터 출력은 국소적으로 수정할 수 없다. 과학적 그림은 개별 의미적 구성 요소들의 구조적 조합이기 때문에, 생성기가 이러한 레이아웃에서 만들어내는 국소적 오류는 더 강력한 백본이 아닌 제어 장치를 요구한다. 우리는 이 제어 장치를 두 가지 상호 보완적인 시스템으로 구체화한다: 그림 생성 시 그림 유형과 입력 조건에 관계없이 아키텍처 변경 없이 일반화되는 다중 에이전트 제어 장치인 Crafter와, 동일한 패턴을 적용하여 래스터 출력을 편집 가능한 SVG로 변환하는 CraftEditor이다. 또한, 우리는 인간의 품질 주석이 포함된 세 가지 그림 유형과 네 가지 입력 조건을 아우르는 벤치마크인 CraftBench를 소개한다. 실험 결과, Crafter는 PaperBanana-Bench와 CraftBench에서 단독 생성기 및 에이전트 기반 기준선을 크게 능가하며, 절제 연구를 통해 각 구성 요소의 독립적 기여를 확인하였다; CraftEditor는 출력을 충실히 편집 가능한 SVG로 변환하여 모든 기준선을 능가한다. 코드와 벤치마크는 https://github.com/HaozheZhao/Crafter에서 확인할 수 있다.

English

Scientific figures are among the most effective means of communicating complex research ideas, yet producing publication-quality illustrations remains one of the most labor-intensive parts of paper preparation. Existing automated systems each target a single figure type under text-only input, leaving the diversity of types and conditions researchers actually use unaddressed; their raster outputs further cannot be locally revised. Because scientific figures are structured compositions of discrete semantic components, the localized errors generators produce on such layouts demand not a stronger backbone but a harness. We instantiate this harness in two complementary systems: Crafter, a multi-agent harness for figure generation that generalizes across figure types and input conditions without architectural changes, and CraftEditor, which applies the same pattern to convert raster outputs into editable SVGs. Moreover, we introduce CraftBench, a benchmark spanning three figure types and four input conditions with human quality annotation. Experiments show that Crafter substantially outperforms both standalone generators and the agentic baseline on PaperBanana-Bench and CraftBench, with ablations confirming each component's independent contribution; CraftEditor faithfully converts outputs into editable SVGs that surpass all baselines. Our code and benchmark are available at https://github.com/HaozheZhao/Crafter.