추상화 사슬 추론을 통한 효율적인 도구 사용

초록

인간의 기대에 부합하는 충실한 추론을 달성하기 위해서는 대규모 언어 모델(LLM)이 실제 세계의 지식(예: 웹 사실, 수학 및 물리적 규칙)에 기반하여 추론을 해야 합니다. 도구는 LLM이 이러한 외부 지식에 접근할 수 있도록 돕지만, 다단계 추론 문제에서 도구를 호출하도록 LLM 에이전트(예: Toolformer)를 미세 조정하는 데는 여전히 과제가 남아 있습니다. 특히 상호 연결된 도구 호출은 전체적이고 효율적인 도구 사용 계획을 요구합니다. 이 연구에서 우리는 다단계 추론에서 도구를 더 잘 활용하기 위한 새로운 방법을 제안합니다. 우리의 방법인 추상화의 연쇄(Chain-of-Abstraction, CoA)는 LLM이 먼저 추상적인 자리 표시자(placeholder)를 포함한 추론 연쇄를 디코딩하도록 훈련시킨 후, 도메인 도구를 호출하여 각 추론 연쇄를 구체적인 지식으로 채우도록 합니다. 이 추상적 연쇄를 통한 계획은 LLM이 더 일반적인 추론 전략을 학습하도록 하며, 이는 다양한 추론 질문과 관련된 도메인 지식(예: 수학 결과)의 변화에 강건합니다. 또한, LLM이 외부 도구의 디코딩과 호출을 병렬로 수행할 수 있게 하여 도구 응답을 기다리는 데 따른 추론 지연을 방지합니다. 수학적 추론 및 위키 QA 도메인에서 우리의 방법은 이전의 사고의 연쇄(chain-of-thought) 및 도구 보강 기반선(baseline)을 모두 내부 분포(in-distribution) 및 외부 분포(out-of-distribution) 테스트 세트에서 일관되게 능가하며, 평균 약 6%의 절대 QA 정확도 향상을 보였습니다. 우리의 방법으로 훈련된 LLM 에이전트는 또한 더 효율적인 도구 사용을 보여주며, 추론 속도가 평균적으로 도구 보강 LLM 기반선보다 약 1.4배 빠릅니다.

English

To achieve faithful reasoning that aligns with human expectations, large language models (LLMs) need to ground their reasoning to real-world knowledge (e.g., web facts, math and physical rules). Tools help LLMs access this external knowledge, but there remains challenges for fine-tuning LLM agents (e.g., Toolformer) to invoke tools in multi-step reasoning problems, where inter-connected tool calls require holistic and efficient tool usage planning. In this work, we propose a new method for LLMs to better leverage tools in multi-step reasoning. Our method, Chain-of-Abstraction (CoA), trains LLMs to first decode reasoning chains with abstract placeholders, and then call domain tools to reify each reasoning chain by filling in specific knowledge. This planning with abstract chains enables LLMs to learn more general reasoning strategies, which are robust to shifts of domain knowledge (e.g., math results) relevant to different reasoning questions. It also allows LLMs to perform decoding and calling of external tools in parallel, which avoids the inference delay caused by waiting for tool responses. In mathematical reasoning and Wiki QA domains, we show that our method consistently outperforms previous chain-of-thought and tool-augmented baselines on both in-distribution and out-of-distribution test sets, with an average ~6% absolute QA accuracy improvement. LLM agents trained with our method also show more efficient tool use, with inference speed being on average ~1.4x faster than baseline tool-augmented LLMs.

추상화 사슬 추론을 통한 효율적인 도구 사용

Efficient Tool Use with Chain-of-Abstraction Reasoning

초록

Support