探索之困:交互任务中的语言模型表现
Failing to Explore: Language Models on Interactive Tasks
January 29, 2026
作者: Mahdi JafariRaviz, Keivan Rezaei, Arshia Soltani Moakhar, Zahra Sodagar, Yize Cheng, Soheil Feizi
cs.AI
摘要
我們評估語言模型在有限互動預算下探索互動環境的能力。本文提出三種可控制探索難度的參數化任務,涵蓋連續與離散環境。針對多種前沿模型的研究發現,其普遍存在探索不足和求解次優的問題,表現往往遠遜於簡單的探索-利用啟發式基準線,且隨預算增加僅呈現弱增長態勢。最後我們研究兩種輕量級干預措施:將固定預算拆分為並行執行(儘管理論分析顯示該方法對我們的任務無增益,但實際效果卻意外提升),以及定期總結互動歷程(該方法能保留關鍵發現並進一步改善探索效果)。
English
We evaluate language models on their ability to explore interactive environments under a limited interaction budget. We introduce three parametric tasks with controllable exploration difficulty, spanning continuous and discrete environments. Across state-of-the-art models, we find systematic under-exploration and suboptimal solutions, with performance often significantly worse than simple explore--exploit heuristic baselines and scaling weakly as the budget increases. Finally, we study two lightweight interventions: splitting a fixed budget into parallel executions, which surprisingly improves performance despite a no-gain theoretical result for our tasks, and periodically summarizing the interaction history, which preserves key discoveries and further improves exploration.