대조적 사고 연쇄 프롬프팅

초록

사고의 연쇄(chain of thought)가 언어 모델의 추론 능력을 향상시키는 데 성공적이었음에도 불구하고, 그 근본적인 과정은 여전히 잘 이해되지 않고 있습니다. 논리적으로 타당한 추론이 사고의 연쇄에 본질적으로 중요해 보이지만, 놀랍게도 이전 연구들은 유효하지 않은 데모를 사용했을 때 미미한 영향만을 보여주었습니다. 더욱이, 기존의 사고의 연쇄는 언어 모델에게 어떤 실수를 피해야 하는지 알려주지 않아, 오히려 더 많은 오류를 초래할 가능성이 있습니다. 따라서 인간이 긍정적 및 부정적 예시로부터 학습할 수 있는 방식에서 영감을 받아, 우리는 언어 모델의 추론 능력을 강화하기 위해 대조적 사고의 연쇄(contrastive chain of thought)를 제안합니다. 기존의 사고의 연쇄와 비교하여, 우리의 접근 방식은 유효한 추론과 유효하지 않은 추론의 데모를 모두 제공하여 모델이 단계별로 추론하면서 추론 오류를 줄이도록 안내합니다. 일반화 능력을 향상시키기 위해, 우리는 대조적 데모를 자동으로 구성하는 방법을 도입했습니다. 추론 벤치마크에서의 실험 결과, 대조적 사고의 연쇄가 사고의 연쇄 프롬프팅의 일반적인 개선으로 기능할 수 있음을 보여줍니다.

English

Despite the success of chain of thought in enhancing language model reasoning, the underlying process remains less well understood. Although logically sound reasoning appears inherently crucial for chain of thought, prior studies surprisingly reveal minimal impact when using invalid demonstrations instead. Furthermore, the conventional chain of thought does not inform language models on what mistakes to avoid, which potentially leads to more errors. Hence, inspired by how humans can learn from both positive and negative examples, we propose contrastive chain of thought to enhance language model reasoning. Compared to the conventional chain of thought, our approach provides both valid and invalid reasoning demonstrations, to guide the model to reason step-by-step while reducing reasoning mistakes. To improve generalization, we introduce an automatic method to construct contrastive demonstrations. Our experiments on reasoning benchmarks demonstrate that contrastive chain of thought can serve as a general enhancement of chain-of-thought prompting.

대조적 사고 연쇄 프롬프팅

Contrastive Chain-of-Thought Prompting

초록

Support