ChatPaper.aiChatPaper

微調小型語言模型還是提示大型語言模型?生成低代碼工作流的案例探討

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

May 30, 2025
作者: Orlando Marquez Ayala, Patrice Bechard, Emily Chen, Maggie Baird, Jingfei Chen
cs.AI

摘要

大型語言模型(LLMs),如GPT-4o,在適當的提示下能夠處理多種複雜任務。隨著每token成本的降低,針對現實世界應用微調小型語言模型(SLMs)的優勢——更快的推理速度、更低的成本——可能不再明顯。在本研究中,我們提供證據表明,對於需要結構化輸出的特定領域任務,SLMs仍具有質量優勢。我們比較了在生成JSON格式的低代碼工作流程任務中,微調SLM與提示LLMs的效果。我們觀察到,雖然良好的提示可以產生合理的結果,但微調平均能將質量提升10%。此外,我們還進行了系統的錯誤分析,以揭示模型的局限性。
English
Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.
PDF52June 2, 2025