SLM을 미세 조정할 것인가, LLM에 프롬프트를 제공할 것인가? 로우 코드 워크플로 생성 사례

초록

GPT-4o와 같은 대형 언어 모델(LLMs)은 적절한 프롬프트를 통해 다양한 복잡한 작업을 처리할 수 있다. 토큰 비용이 감소함에 따라, 실용적인 애플리케이션을 위해 소형 언어 모델(SLMs)을 미세 조정하는 것의 장점 — 더 빠른 추론, 더 낮은 비용 — 이 더 이상 명확하지 않을 수 있다. 본 연구에서는 구조화된 출력이 필요한 도메인 특화 작업에 대해 SLMs가 여전히 품질상의 우위를 가진다는 증거를 제시한다. 우리는 JSON 형식의 로우코드 워크플로우 생성 작업에서 SLM을 미세 조정하는 것과 LLM에 프롬프트를 제공하는 것을 비교한다. 좋은 프롬프트가 합리적인 결과를 얻을 수 있지만, 미세 조정은 평균적으로 품질을 10% 향상시키는 것을 관찰했다. 또한 체계적인 오류 분석을 수행하여 모델의 한계를 밝혀냈다.

English

Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.

SLM을 미세 조정할 것인가, LLM에 프롬프트를 제공할 것인가? 로우 코드 워크플로 생성 사례

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

초록

Support