ChatPaper.aiChatPaper

利用MCTS-自動化結構思維提升多模態推理

Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking

February 4, 2025
作者: Jinyang Wu, Mingkuan Feng, Shuai Zhang, Ruihan Jin, Feihu Che, Zengqi Wen, Jianhua Tao
cs.AI

摘要

多模式大型語言模型(MLLMs)展現出令人印象深刻的能力,但在複雜的視覺推理方面仍面臨挑戰。儘管最近的努力試圖通過納入類似OpenAI o1的結構化思維來增強MLLMs的推理能力,例如明確的搜索結構或教師引導的蒸餾,但它們往往難以平衡性能和效率。一個關鍵限制是它們過度依賴廣泛的數據和搜索空間,導致低效的隱式洞察提取和數據利用。為了解決這個問題,我們提出AStar,一種通過蒙特卡羅樹搜索(MCTS)實現多模式推理的自動化結構化思維範式。AStar利用MCTS驅動的分層結構從有限數據中自動推導高層次的認知推理模式。基於這些明確的模式,我們設計了一個統一的推理框架,無縫集成模型的內部推理能力和外部推理指導,實現了在最小樹迭代次數下的高效推理。這種新範式在性能和效率之間取得了引人注目的平衡。大量實驗證明了AStar的有效性,在MathVerse基準測試中以7B骨幹實現了卓越的準確性(54.0%),超越了GPT-4o(50.2%),同時保持了可觀的數據和計算效率。
English
Multimodal large language models (MLLMs) exhibit impressive capabilities but still face challenges in complex visual reasoning. While recent efforts attempt to enhance MLLMs' reasoning by incorporating OpenAI o1-like structured thinking through explicit search structures or teacher-guided distillation, they often struggle to balance performance and efficiency. A critical limitation is their heavy reliance on extensive data and search spaces, resulting in low-efficiency implicit insight extraction and data utilization. To address this, we propose AStar, an Automated Structured thinking paradigm for multimodal reasoning via Monte Carlo Tree Search (MCTS). AStar automatically derives high-level cognitive reasoning patterns from limited data using MCTS-powered hierarchical structures. Building on these explicit patterns, we design a unified reasoning framework that seamlessly integrates models' internal reasoning capabilities and external reasoning guidelines, enabling efficient inference with minimal tree iterations. This novel paradigm strikes a compelling balance between performance and efficiency. Extensive experiments demonstrate AStar's effectiveness, achieving superior accuracy (54.0%) on the MathVerse benchmark with a 7B backbone, surpassing GPT-4o (50.2%) while maintaining substantial data and computational efficiency.

Summary

AI-Generated Summary

PDF224February 6, 2025