대규모 언어 모델 터미널 역량 확장을 위한 데이터 엔지니어링 연구

초록

대규모 언어 모델의 터미널 성능에 관한 최근의 급속한 발전에도 불구하고, 최첨단 터미널 에이전트의 학습 데이터 전략은 대부분 공개되지 않고 있습니다. 본 연구는 터미널 에이전트를 위한 데이터 엔지니어링 방법론에 대한 체계적인 연구를 통해 이러한 격차를 해소하며, 두 가지 주요 기여를 합니다: (1) 시드 기반 및 스킬 기반 작업 구성을 지원하는 경량 합성 작업 생성 파이프라인인 Terminal-Task-Gen, (2) 필터링, 커리큘럼 학습, 장문맥 학습, 확장 규모에 따른 성능 변화를 포함한 데이터 및 학습 전략에 대한 포괄적인 분석입니다. 본 파이프라인을 통해 터미널 작업용 대규모 오픈소스 데이터셋인 Terminal-Corpus가 생성되었습니다. 이 데이터셋을 활용하여 Qwen3(8B, 14B, 32B) 모델을 기반으로 초기화된 Nemotron-Terminal 모델 군을 학습시켰으며, Terminal-Bench 2.0에서 상당한 성능 향상을 달성했습니다: Nemotron-Terminal-8B는 2.5%에서 13.0%로, Nemotron-Terminal-14B는 4.0%에서 20.2%로, Nemotron-Terminal-32B는 3.4%에서 27.4%로 성능이 향상되어 훨씬 더 큰 모델들의 성능에 필적합니다. 해당 분야의 연구 가속화를 위해 모델 체크포인트와 대부분의 합성 데이터셋을 https://huggingface.co/collections/nvidia/nemotron-terminal 에서 오픈소스로 공개합니다.

English

Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0% Nemotron-Terminal-14B improves from 4.0% to 20.2%, and Nemotron-Terminal-32B improves from 3.4% to 27.4%, matching the performance of significantly larger models. To accelerate research in this domain, we open-source our model checkpoints and most of our synthetic datasets at https://huggingface.co/collections/nvidia/nemotron-terminal.

대규모 언어 모델 터미널 역량 확장을 위한 데이터 엔지니어링 연구

On Data Engineering for Scaling LLM Terminal Capabilities

초록

Support