學習任務分解以協助人類在競爭性編程中。

摘要

當使用語言模型（LMs）來解決複雜問題時，人類可能會難以理解LM生成的解決方案並修復有缺陷的解決方案。為了幫助人類修復這些解決方案，我們提出自動將複雜解決方案分解為多個對應於特定子任務的簡單部分。我們引入了一個新的學習任務分解的目標，稱為輔助值（AssistV），該值衡量了人類修復分解解決方案的可行性和速度。我們收集了一個人類修復不同分解解決方案經驗的數據集。利用所收集的數據作為上下文示例，我們學習批評、改進和排名分解解決方案以改善AssistV。我們在競技程式設計問題下驗證了我們的方法：在177小時的人類研究中，我們的方法使非專家能夠解決更多問題（增加33.3％），加快速度（提高3.3倍），並使他們能夠與未經協助的專家匹敵。

English

When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. We collect a dataset of human repair experiences on different decomposed solutions. Utilizing the collected data as in-context examples, we then learn to critique, refine, and rank decomposed solutions to improve AssistV. We validate our method under competitive programming problems: under 177 hours of human study, our method enables non-experts to solve 33.3\% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.

學習任務分解以協助人類在競爭性編程中。

Learning Task Decomposition to Assist Humans in Competitive Programming

摘要

Support