SkCC: 이식 가능하고 안전한 스킬 컴파일을 위한 크로스-프레임워크 LLM 에이전트

초록

LLM 에이전트는 복잡한 작업 실행을 위한 자율 시스템으로 발전해 왔으며, SKILL.md 사양은 에이전트 기능을 캡슐화하는 사실상의 표준으로 자리 잡았다. 그러나 중요한 병목 현상이 남아 있다. 서로 다른 에이전트 프레임워크는 프롬프트 포맷팅에 대해 현저히 다른 민감도를 보여 최대 40%의 성능 변동을 초래하지만, 거의 모든 스킬은 단일한 포맷에 무관한 마크다운 버전으로만 존재한다. 플랫폼별 수동 재작성은 지속 불가능한 유지보수 부담을 만드는 반면, 이전 감사에서는 커뮤니티 스킬의 3분의 1 이상이 보안 취약점을 포함하고 있는 것으로 밝혀졌다. 이 문제를 해결하기 위해, 우리는 SkCC를 제시한다. 이는 고전적 컴파일러 설계를 에이전트 스킬 개발에 도입한 컴파일 프레임워크이다. 핵심적으로, SkIR(강타입 중간 표현)은 스킬 의미론을 플랫폼별 포맷팅으로부터 분리하여 이기종 에이전트 프레임워크 간 이식 가능한 배포를 가능하게 한다. 이 IR을 기반으로, 컴파일 타임 분석기는 배포 전에 안티-스킬 인젝션을 통해 보안 제약을 적용한다. 4단계 파이프라인을 통해 SkCC는 적응 복잡도를 O(m × n)에서 O(m + n)으로 줄인다. SkillsBench 실험 결과, 컴파일된 스킬이 원본 스킬보다 일관되게 우수한 성능을 보여, Claude Code에서 통과율이 21.1%에서 33.3%로, Kimi CLI에서 35.1%에서 48.7%로 향상되었으며, 10ms 미만의 컴파일 지연 시간, 94.8%의 사전적 보안 트리거율, 그리고 플랫폼 전반에 걸쳐 10-46%의 런타임 토큰 절감을 달성했다.

English

LLM-Agents have evolved into autonomous systems for complex task execution, with the SKILL.md specification emerging as a de facto standard for encapsulating agent capabilities. However, a critical bottleneck remains: different agent frameworks exhibit starkly different sensitivities to prompt formatting, causing up to 40% performance variation, yet nearly all skills exist as a single, format-agnostic Markdown version. Manual per-platform rewriting creates an unsustainable maintenance burden, while prior audits have found that over one third of community skills contain security vulnerabilities. To address this, we present SkCC, a compilation framework that introduces classical compiler design into agent skill development. At its core, SkIR - a strongly-typed intermediate representation - decouples skill semantics from platform-specific formatting, enabling portable deployment across heterogeneous agent frameworks. Around this IR, a compile-time Analyzer enforces security constraints via Anti-Skill Injection before deployment. Through a four-phase pipeline, SkCC reduces adaptation complexity from O(m times n) to O(m + n). Experiments on SkillsBench demonstrate that compiled skills consistently outperform their original counterparts, improving pass rates from 21.1% to 33.3% on Claude Code and from 35.1% to 48.7% on Kimi CLI, while achieving sub-10ms compilation latency, a 94.8% proactive security trigger rate, and 10-46% runtime token savings across platforms.