DSPy: 宣言的言語モデル呼び出しを自己改善パイプラインへコンパイルする

要旨

MLコミュニティは、言語モデル（LM）に対するプロンプト技術や、複雑なタスクを解決するためのパイプライン構築技術を急速に探求しています。しかし、既存のLMパイプラインは通常、試行錯誤によって発見された長い文字列である「プロンプトテンプレート」をハードコーディングして実装されています。LMパイプラインの開発と最適化をより体系的なアプローチで進めるため、私たちはDSPyを導入します。DSPyは、LMパイプラインをテキスト変換グラフとして抽象化するプログラミングモデルであり、命令型の計算グラフにおいてLMを宣言型モジュールを通じて呼び出します。DSPyモジュールはパラメータ化されており、プロンプト、ファインチューニング、拡張、推論技術の組み合わせを適用する方法を（デモンストレーションを作成・収集することで）学習できます。私たちは、任意のDSPyパイプラインを最適化して所与の指標を最大化するコンパイラを設計しました。2つのケーススタディを行い、簡潔なDSPyプログラムが、数学文章題の推論、マルチホップ検索、複雑な質問への回答、エージェントループの制御といった高度なLMパイプラインを表現・最適化できることを示しました。コンパイルから数分以内に、わずか数行のDSPyコードにより、GPT-3.5とllama2-13b-chatが自己ブートストラップするパイプラインを構築し、標準的なFew-shotプロンプティング（一般的にそれぞれ25％以上、65％以上）や、専門家が作成したデモンストレーションを用いたパイプライン（それぞれ最大5-46％、16-40％）を上回りました。さらに、770MパラメータのT5やllama2-13b-chatといったオープンで比較的小さなLMにコンパイルされたDSPyプログラムは、プロプライエタリなGPT-3.5向けに専門家が作成したプロンプトチェーンに依存するアプローチと競争力があります。DSPyはhttps://github.com/stanfordnlp/dspyで利用可能です。

English

The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy

DSPy: 宣言的言語モデル呼び出しを自己改善パイプラインへコンパイルする

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

要旨

Support