경쟁 프로그래밍에서 인간을 지원하기 위한 학습 작업 분해

초록

언어 모델(Language Models, LMs)을 사용하여 복잡한 문제를 해결할 때, 인간은 LM이 생성한 해결책을 이해하고 오류를 수정하는 데 어려움을 겪을 수 있습니다. 이를 보왕하기 위해 우리는 복잡한 해결책을 여러 간단한 부분으로 자동으로 분해하여 특정 하위 작업에 해당하는 각 부분으로 분해하는 것을 제안합니다. 우리는 학습 작업 분해를 위한 새로운 목적을 소개하는데, 이를 Assistive Value (AssistV)라고 하며, 이는 인간이 분해된 해결책을 수정하는 데 필요한 실행 가능성과 속도를 측정합니다. 우리는 다양한 분해된 해결책에 대한 인간의 수정 경험 데이터셋을 수집합니다. 이 수집된 데이터를 맥락을 고려한 예제로 활용하여, 우리는 분해된 해결책을 비평하고 개선하며 순위를 매겨 AssistV를 향상시킵니다. 우리는 경쟁적 프로그래밍 문제에서 우리의 방법을 검증합니다: 177시간의 인간 연구를 통해, 우리의 방법을 통해 비전문가가 문제를 33.3% 더 해결할 수 있게 되었으며, 속도가 3.3배 향상되었고, 비지원 전문가들과 맞설 수 있게 되었습니다.

English

When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. We collect a dataset of human repair experiences on different decomposed solutions. Utilizing the collected data as in-context examples, we then learn to critique, refine, and rank decomposed solutions to improve AssistV. We validate our method under competitive programming problems: under 177 hours of human study, our method enables non-experts to solve 33.3\% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.

경쟁 프로그래밍에서 인간을 지원하기 위한 학습 작업 분해

Learning Task Decomposition to Assist Humans in Competitive Programming

초록

Summary

Support

Support