標記混合:混合潛在標記和文本標記以提升語言模型推理
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning
February 5, 2025
作者: DiJia Su, Hanlin Zhu, Yingchen Xu, Jiantao Jiao, Yuandong Tian, Qinqing Zheng
cs.AI
摘要
大型語言模型(LLMs)在接受以思維鏈(CoT)數據訓練時擅長推理和規劃,其中逐步思考過程由文本標記明確概述。然而,這導致輸入過長,其中許多詞彙支持文本連貫性而非核心推理信息,處理這些輸入需要大量計算資源。在這項工作中,我們提出了一種推理過程的混合表示,部分抽象化初始推理步驟,使用由VQ-VAE生成的潛在離散標記,顯著減少推理跡的長度。我們探索了在兩種情況下使用潛在跡抽象的方法:1)從頭開始為Keys-Finding Maze問題訓練模型,2)在這種混合數據上對LLMs進行微調,其中包括未見過的潛在標記,用於邏輯和數學推理問題。為了促進有效學習,我們引入了一個簡單的訓練程序,隨機混合潛在和文本標記,這使得對新潛在標記的快速適應成為可能。我們的方法在各種基準測試中始終優於基準方法。
English
Large Language Models (LLMs) excel at reasoning and planning when trained on
chainof-thought (CoT) data, where the step-by-step thought process is
explicitly outlined by text tokens. However, this results in lengthy inputs
where many words support textual coherence rather than core reasoning
information, and processing these inputs consumes substantial computation
resources. In this work, we propose a hybrid representation of the reasoning
process, where we partially abstract away the initial reasoning steps using
latent discrete tokens generated by VQ-VAE, significantly reducing the length
of reasoning traces. We explore the use of latent trace abstractions in two
scenarios: 1) training the model from scratch for the Keys-Finding Maze
problem, 2) fine-tuning LLMs on this hybrid data with an extended vocabulary
including unseen latent tokens, for both logical and mathematical reasoning
problems. To facilitate effective learning, we introduce a simple training
procedure that randomly mixes latent and text tokens, which enables fast
adaptation to new latent tokens. Our approach consistently outperforms the
baselines methods in various benchmarks.Summary
AI-Generated Summary