Skills-Coach: 트레이닝 프리 GRPO를 통한 자기 진화형 스킬 최적화

초록

본 논문에서는 대규모 언어 모델(LLM) 기반 에이전트의 기술 자기 진화를 효과적으로 촉진하기 위한 새로운 자동화 프레임워크인 Skills-Coach를 소개한다. Skills-Coach는 현재 기술 생태계의 파편화 문제를 해결하고 기술 역량의 한계를 탐구함으로써 지능형 애플리케이션에 필수적인 포괄적 능력 범위를 확보하는 데 기여한다. 본 프레임워크는 4가지 핵심 모듈로 구성된다: 다양한 기술을 체계적으로 평가하기 위한 포괄적인 테스트 슈트를 생성하는 다양성 작업 생성 모듈, 기술 프롬프트와 해당 코드 최적화를 담당하는 경량 최적화 모듈, 원본 기술과 최적화된 기술의 실행 및 평가를 수행하는 비교 실행 모듈, 명시된 기준에 따라 성능을 엄격하게 평가하는 추적 가능 평가 모듈이 그것이다. Skills-Coach는 가상 모드와 실제 모드라는 유연한 실행 옵션을 제공한다. 효과성을 검증하기 위해 48가지 다양한 기술로 구성된 포괄적 벤치마크 데이터셋인 Skill-X를 도입하였다. 실험 결과, Skills-Coach는 광범위한 범주의 기술 역량에서 유의미한 성능 향상을 달성하여 보다 강력하고 적응력 있는 LLM 기반 에이전트 개발을 진전시킬 잠재력을 입증하였다.

English

We introduce Skills-Coach, a novel automated framework designed to significantly enhance the self-evolution of skills within Large Language Model (LLM)-based agents. Addressing the current fragmentation of the skill ecosystem, Skills-Coach explores the boundaries of skill capabilities, thereby facilitating the comprehensive competency coverage essential for intelligent applications. The framework comprises four core modules: a Diverse Task Generation Module that systematically creates a comprehensive test suite for various skills; a Lightweight Optimization Module dedicated to optimizing skill prompts and their corresponding code; a Comparative Execution Module facilitating the execution and evaluation of both original and optimized skills; and a Traceable Evaluation Module, which rigorously evaluates performance against specified criteria. Skills-Coach offers flexible execution options through its virtual and real modes. To validate its efficacy, we introduce Skill-X, a comprehensive benchmark dataset consisting of 48 diverse skills. Experimental results demonstrate that Skills-Coach achieves significant performance improvements in skill capability across a wide range of categories, highlighting its potential to advance the development of more robust and adaptable LLM-based agents.

Skills-Coach: 트레이닝 프리 GRPO를 통한 자기 진화형 스킬 최적화

Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO

초록

Support