ChatPaper.aiChatPaper

微调小型语言模型还是提示大型语言模型?生成低代码工作流的案例研究

Fine-Tune an SLM or Prompt an LLM? The Case of Generating Low-Code Workflows

May 30, 2025
作者: Orlando Marquez Ayala, Patrice Bechard, Emily Chen, Maggie Baird, Jingfei Chen
cs.AI

摘要

诸如GPT-4o之类的大型语言模型(LLMs)在恰当的提示下能够处理多种复杂任务。随着每令牌成本的降低,为实际应用微调小型语言模型(SLMs)的优势——更快的推理速度和更低的成本——可能不再明显。在本研究中,我们提供证据表明,对于需要结构化输出的特定领域任务,SLMs仍具备质量优势。我们比较了在生成JSON格式的低代码工作流任务上,微调SLM与提示LLMs的效果。我们发现,尽管良好的提示能产生合理的结果,但微调平均能提升10%的质量。此外,我们还进行了系统的错误分析,以揭示模型的局限性。
English
Large Language Models (LLMs) such as GPT-4o can handle a wide range of complex tasks with the right prompt. As per token costs are reduced, the advantages of fine-tuning Small Language Models (SLMs) for real-world applications -- faster inference, lower costs -- may no longer be clear. In this work, we present evidence that, for domain-specific tasks that require structured outputs, SLMs still have a quality advantage. We compare fine-tuning an SLM against prompting LLMs on the task of generating low-code workflows in JSON form. We observe that while a good prompt can yield reasonable results, fine-tuning improves quality by 10% on average. We also perform systematic error analysis to reveal model limitations.

Summary

AI-Generated Summary

PDF52June 2, 2025