SLMをファインチューニングするか、LLMにプロンプトを与えるか？ローコードワークフロー生成の事例

要旨

GPT-4oのような大規模言語モデル（LLMs）は、適切なプロンプトを与えることで幅広い複雑なタスクを処理できます。トークンあたりのコストが低下するにつれ、現実世界のアプリケーションにおける小規模言語モデル（SLMs）のファインチューニングの利点——推論速度の向上、コストの低減——は、もはや明確ではなくなるかもしれません。本研究では、構造化された出力を必要とするドメイン固有のタスクにおいて、SLMsが依然として品質面で優位性を持つことを示す証拠を提示します。JSON形式のローコードワークフロー生成タスクにおいて、SLMのファインチューニングとLLMのプロンプティングを比較しました。その結果、適切なプロンプトは妥当な結果をもたらすものの、ファインチューニングにより品質が平均10％向上することが観察されました。また、系統的なエラー分析を行い、モデルの限界を明らかにしました。

English

Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.

SLMをファインチューニングするか、LLMにプロンプトを与えるか？ローコードワークフロー生成の事例

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

要旨

Support