論理的推論のための思考の混合による推論学習

要旨

人間は自然に複数の推論モダリティを活用して学習し、論理的問題を解決する。すなわち、自然言語、コード、記号論理など、異なる表現形式を用いる。一方、既存のLLM（大規模言語モデル）ベースのアプローチの多くは、訓練中に単一の推論モダリティ、通常は自然言語のみを利用する。推論時にモダリティの選択や拡張を試みる手法もあるが、訓練プロセスはモダリティを意識しないまま進められ、モダリティ間の相乗効果が制限されている。このギャップを埋めるため、我々はMixture-of-Thought（MoT）を提案する。これは、自然言語、コード、および新たに導入された記号モダリティである真理値表という3つの補完的なモダリティを横断して推論を行うLLMのフレームワークである。真理値表は論理ケースを体系的に列挙し、自然言語推論における主要な失敗モードを部分的に緩和する。MoTは2段階の設計を採用する：(1) 自己進化型MoT訓練。これは、モダリティ間でフィルタリングされた自己生成された根拠から共同で学習する。(2) MoT推論。これは、3つのモダリティの相乗効果を最大限に活用してより良い予測を生成する。FOLIOやProofWriterなどの論理推論ベンチマークでの実験により、MoTフレームワークが単一モダリティの連鎖的思考アプローチを用いた強力なLLMベースラインを一貫して大幅に上回り、最大+11.7ppの平均精度向上を達成することが示された。さらに分析により、MoTフレームワークが訓練と推論の両段階で有効であること、特に難しい論理推論問題において効果的であること、そして異なるモダリティが補完的な強みを発揮し、真理値表推論が自然言語推論における主要なボトルネックを克服するのに役立つことが明らかになった。

English

Human beings naturally utilize multiple reasoning modalities to learn and solve logical problems, i.e., different representational formats such as natural language, code, and symbolic logic. In contrast, most existing LLM-based approaches operate with a single reasoning modality during training, typically natural language. Although some methods explored modality selection or augmentation at inference time, the training process remains modality-blind, limiting synergy among modalities. To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning. MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions. Experiments on logical reasoning benchmarks including FOLIO and ProofWriter demonstrate that our MoT framework consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving up to +11.7pp average accuracy gain. Further analyses show that our MoT framework benefits both training and inference stages; that it is particularly effective on harder logical reasoning problems; and that different modalities contribute complementary strengths, with truth-table reasoning helping to overcome key bottlenecks in natural language inference.

論理的推論のための思考の混合による推論学習

Learning to Reason via Mixture-of-Thought for Logical Reasoning

要旨

Support