ChatPaper.aiChatPaper

大型语言模型编码自身失败:通过预生成激活预测成功

LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations

February 10, 2026
作者: William Lugoloobi, Thomas Foster, William Bankes, Chris Russell
cs.AI

摘要

在每個問題上運行具備擴展推理能力的大語言模型成本高昂,但如何判定哪些輸入真正需要額外計算資源仍是難題。我們研究模型在生成答案前,能否從其內部表徵中提取自身成功概率的訊號,並探討此訊號能否引導更高效的推理。通過在生成前激活值上訓練線性探針,我們成功預測了數學與程式設計任務中特定策略的成功率,其表現顯著優於問題長度、TF-IDF等表層特徵。利用E2H-AMC框架(可提供人類與模型在相同問題上的表現數據),我們發現模型編碼了有別於人類難度認知的模型專屬難度概念,且這種差異會隨擴展推理而擴大。基於這些探針,我們證實透過在模型池中智能分配查詢任務,不僅能超越單一最佳模型的性能,更在MATH數據集上實現推理成本降低最高達70%。這表明即使內部表徵與人類難度直覺存在分歧,仍能實現顯著的實用效能提升。相關程式碼已開源於:https://github.com/KabakaWilliam/llms_know_difficulty
English
Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own likelihood of success is recoverable from their internal representations before generation, and if this signal can guide more efficient inference. We train linear probes on pre-generation activations to predict policy-specific success on math and coding tasks, substantially outperforming surface features such as question length and TF-IDF. Using E2H-AMC, which provides both human and model performance on identical problems, we show that models encode a model-specific notion of difficulty that is distinct from human difficulty, and that this distinction increases with extended reasoning. Leveraging these probes, we demonstrate that routing queries across a pool of models can exceed the best-performing model whilst reducing inference cost by up to 70\% on MATH, showing that internal representations enable practical efficiency gains even when they diverge from human intuitions about difficulty. Our code is available at: https://github.com/KabakaWilliam/llms_know_difficulty
PDF11February 12, 2026