学习任务分解以协助人类进行竞技编程

摘要

当使用语言模型（LMs）解决复杂问题时，人类可能难以理解LM生成的解决方案并修复有缺陷的解决方案。为了帮助人类修复这些解决方案，我们提出自动将复杂解决方案分解为多个对应于特定子任务的简单部分。我们引入了一种新的学习任务分解目标，称为辅助价值（AssistV），用于衡量人类修复分解解决方案的可行性和速度。我们收集了一组关于不同分解解决方案的人类修复经验数据集。利用收集的数据作为上下文示例，我们学习批判、改进和排名分解解决方案以改善AssistV。我们在竞技编程问题下验证了我们的方法：在177小时的人类研究中，我们的方法使非专家能够解决更多问题（增加了33.3%），加快了他们的速度（提高了3.3倍），并使他们能够与未经协助的专家匹敌。

English

When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. We collect a dataset of human repair experiences on different decomposed solutions. Utilizing the collected data as in-context examples, we then learn to critique, refine, and rank decomposed solutions to improve AssistV. We validate our method under competitive programming problems: under 177 hours of human study, our method enables non-experts to solve 33.3\% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.

学习任务分解以协助人类进行竞技编程

Learning Task Decomposition to Assist Humans in Competitive Programming

摘要

Support