思維景觀：大型語言模型推理過程的可視化

摘要

大型語言模型（LLMs）的眾多應用依賴於其執行逐步推理的能力。然而，LLMs的推理行為仍未被充分理解，這對研究、開發及安全性帶來了挑戰。為填補這一空白，我們引入了「思維景觀」——首個可視化工具，讓用戶能夠檢查任何多選數據集上鏈式思維及其衍生方法的推理路徑。具體而言，我們將推理路徑中的狀態表示為特徵向量，這些向量量化了它們與所有答案選項的距離。隨後，使用t-SNE將這些特徵在二維圖中可視化。通過「思維景觀」的定性和定量分析，能有效區分強弱模型、正確與錯誤答案，以及不同的推理任務。它還揭示了不良的推理模式，如低一致性和高不確定性。此外，用戶可將我們的工具適配於預測其觀察屬性的模型。我們展示了這一優勢，通過將工具適配於一個輕量級驗證器，該驗證器評估推理路徑的正確性。代碼公開於：https://github.com/tmlr-group/landscape-of-thoughts。

English

Numerous applications of large language models (LLMs) rely on their ability to perform step-by-step reasoning. However, the reasoning behavior of LLMs remains poorly understood, posing challenges to research, development, and safety. To address this gap, we introduce landscape of thoughts-the first visualization tool for users to inspect the reasoning paths of chain-of-thought and its derivatives on any multi-choice dataset. Specifically, we represent the states in a reasoning path as feature vectors that quantify their distances to all answer choices. These features are then visualized in two-dimensional plots using t-SNE. Qualitative and quantitative analysis with the landscape of thoughts effectively distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks. It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty. Additionally, users can adapt our tool to a model that predicts the property they observe. We showcase this advantage by adapting our tool to a lightweight verifier that evaluates the correctness of reasoning paths. The code is publicly available at: https://github.com/tmlr-group/landscape-of-thoughts.

思維景觀：大型語言模型推理過程的可視化

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

摘要

Support