DSPy:將聲明式語言模型調用編譯為自我改進的流水線
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
October 5, 2023
作者: Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts
cs.AI
摘要
機器學習社群正迅速探索提示語言模型(LMs)的技術,並將它們堆疊成解決複雜任務的流程。不幸的是,現有的LM流程通常是使用硬編碼的“提示模板”實現的,即通過試錯發現的冗長字符串。為了更系統地開發和優化LM流程,我們引入了DSPy,一種將LM流程抽象為文本轉換圖的編程模型,即命令式計算圖,其中LMs通過聲明性模塊調用。DSPy模塊是參數化的,這意味著它們可以通過創建和收集示範來學習如何應用提示、微調、擴增和推理技術的組合。我們設計了一個編譯器,將優化任何DSPy流程以最大化給定的指標。我們進行了兩個案例研究,表明簡潔的DSPy程序可以表達和優化複雜的LM流程,這些流程可以思考數學文字問題,應對多跳檢索,回答複雜問題,並控制代理循環。在經過幾分鐘的編譯後,幾行DSPy允許GPT-3.5和llama2-13b-chat自我引導流程,優於標準的少樣本提示(一般分別高出25%和65%)以及具有專家創建示範的流程(分別高達5-46%和16-40%)。此外,編譯為開放且相對較小的LMs,如770M參數T5和llama2-13b-chat的DSPy程序,與依賴專家編寫的提示鏈的專有GPT-3.5方法競爭。DSPy可在https://github.com/stanfordnlp/dspy找到。
English
The ML community is rapidly exploring techniques for prompting language
models (LMs) and for stacking them into pipelines that solve complex tasks.
Unfortunately, existing LM pipelines are typically implemented using hard-coded
"prompt templates", i.e. lengthy strings discovered via trial and error. Toward
a more systematic approach for developing and optimizing LM pipelines, we
introduce DSPy, a programming model that abstracts LM pipelines as text
transformation graphs, i.e. imperative computational graphs where LMs are
invoked through declarative modules. DSPy modules are parameterized, meaning
they can learn (by creating and collecting demonstrations) how to apply
compositions of prompting, finetuning, augmentation, and reasoning techniques.
We design a compiler that will optimize any DSPy pipeline to maximize a given
metric. We conduct two case studies, showing that succinct DSPy programs can
express and optimize sophisticated LM pipelines that reason about math word
problems, tackle multi-hop retrieval, answer complex questions, and control
agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and
llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot
prompting (generally by over 25% and 65%, respectively) and pipelines with
expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top
of that, DSPy programs compiled to open and relatively small LMs like
770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely
on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at
https://github.com/stanfordnlp/dspy