暗闇を探る：潜在空間におけるテスト時インスタンスレベル方策勾配を用いた推論

要旨

推論能力は、人間の知能の中核をなす要素であり、AGI（人工汎用知能）の追求において、大規模言語モデル（LLMs）にとって依然として重要な課題となっている。モデルの性能はトレーニングのスケーリング則に従って向上しているものの、特にトレーニングアルゴリズムに関しては、破滅的忘却や新規トレーニングデータの限られた可用性といった重大な課題が残されている。代替手段として、テスト時のスケーリングは、パラメータ更新なしにテスト時の計算量を増やすことで推論性能を向上させる。このパラダイムにおける従来の手法がトークン空間に焦点を当てていたのに対し、我々はより効果的な推論とテスト時スケーリング則へのより良い準拠を実現するために、潜在空間を活用することを提案する。我々は、モデルの潜在空間内でテスト時インスタンスレベル適応（TTIA）を通じてLLMの推論を強化する新たなフレームワークであるLatentSeekを紹介する。具体的には、LatentSeekは、自己生成された報酬信号に導かれて、潜在表現を反復的に更新するためにポリシー勾配を活用する。LatentSeekは、GSM8K、MATH-500、AIME2024を含む一連の推論ベンチマークにおいて、複数のLLMアーキテクチャにわたって評価される。結果は、LatentSeekがChain-of-Thoughtプロンプティングやファインチューニングベースの手法といった強力なベースラインを一貫して上回ることを示している。さらに、我々の分析は、LatentSeekが非常に効率的であり、平均的な複雑さの問題に対して通常数回の反復で収束する一方、追加の反復からも恩恵を受けることを示しており、潜在空間におけるテスト時スケーリングの可能性を強調している。これらの発見は、LatentSeekをLLMの推論能力を強化するための軽量でスケーラブルかつ効果的なソリューションとして位置づけている。

English

Reasoning ability, a core component of human intelligence, continues to pose a significant challenge for Large Language Models (LLMs) in the pursuit of AGI. Although model performance has improved under the training scaling law, significant challenges remain, particularly with respect to training algorithms, such as catastrophic forgetting, and the limited availability of novel training data. As an alternative, test-time scaling enhances reasoning performance by increasing test-time computation without parameter updating. Unlike prior methods in this paradigm focused on token space, we propose leveraging latent space for more effective reasoning and better adherence to the test-time scaling law. We introduce LatentSeek, a novel framework that enhances LLM reasoning through Test-Time Instance-level Adaptation (TTIA) within the model's latent space. Specifically, LatentSeek leverages policy gradient to iteratively update latent representations, guided by self-generated reward signals. LatentSeek is evaluated on a range of reasoning benchmarks, including GSM8K, MATH-500, and AIME2024, across multiple LLM architectures. Results show that LatentSeek consistently outperforms strong baselines, such as Chain-of-Thought prompting and fine-tuning-based methods. Furthermore, our analysis demonstrates that LatentSeek is highly efficient, typically converging within a few iterations for problems of average complexity, while also benefiting from additional iterations, thereby highlighting the potential of test-time scaling in the latent space. These findings position LatentSeek as a lightweight, scalable, and effective solution for enhancing the reasoning capabilities of LLMs.

暗闇を探る：潜在空間におけるテスト時インスタンスレベル方策勾配を用いた推論

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

要旨

Support