大規模言語モデルを汎用パターン機械として

要旨

事前学習済み大規模言語モデル（LLM）が、確率的文脈自由文法（PCFG）によって手続き的に生成された任意のトークン列から、汎用AIベンチマークであるAbstract Reasoning Corpus（ARC）に見られるようなより豊かな空間パターンまで、ASCIIアート風にプロンプトされた複雑なトークン列を自己回帰的に補完できることを観察しました。驚くべきことに、語彙からランダムにサンプリングされたトークンを用いてシーケンスを表現した場合でも、パターン補完能力が部分的に保持されることがわかりました。これらの結果は、追加の学習なしで、LLMが文脈内学習によって駆動される汎用シーケンスモデルとして機能し得ることを示唆しています。本研究では、これらのゼロショット能力をロボティクスの問題にどのように適用できるかを調査します。具体的には、時間経過に伴う状態を表す数値列を外挿して単純な動作を補完することから、報酬条件付き軌道のleast-to-mostプロンプティングによって閉ループポリシー（例えば、CartPoleの安定化制御器）を発見・表現することまでを検討します。レイテンシ、コンテキストサイズの制約、計算コストなどの理由で現時点では実システムへの展開は困難ですが、LLMを用いて低レベル制御を駆動するアプローチは、言葉の間のパターンがどのように行動に転移し得るかについての興味深い示唆を提供する可能性があります。

English

We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics -- from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.

大規模言語モデルを汎用パターン機械として

Large Language Models as General Pattern Machines

要旨

Support