競技プログラミングにおける人間の支援のためのタスク分解学習

要旨

言語モデル（LM）を用いて複雑な問題を解決する際、人間はLMが生成した解決策を理解し、欠陥のある部分を修正することに苦労する可能性があります。人間が修正を行うのを支援するため、私たちは複雑な解決策を特定のサブタスクに対応する複数の単純な部分に自動的に分解することを提案します。タスク分解を学習するための新しい目的関数として、Assistive Value（AssistV）を導入します。これは、分解された解決策を人間が修正する際の実現可能性と速度を測定するものです。私たちは、異なる分解された解決策に対する人間の修正経験のデータセットを収集しました。収集したデータをコンテキスト内の例として活用し、分解された解決策を批判、改良、ランク付けすることで、AssistVを向上させることを学習します。私たちの手法を競技プログラミングの問題で検証した結果、177時間に及ぶ人間の研究において、非専門家が33.3％多くの問題を解決し、解決速度が3.3倍向上し、支援なしの専門家と同等の能力を発揮できることが確認されました。

English

When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. We collect a dataset of human repair experiences on different decomposed solutions. Utilizing the collected data as in-context examples, we then learn to critique, refine, and rank decomposed solutions to improve AssistV. We validate our method under competitive programming problems: under 177 hours of human study, our method enables non-experts to solve 33.3\% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.

競技プログラミングにおける人間の支援のためのタスク分解学習

Learning Task Decomposition to Assist Humans in Competitive Programming

要旨

Summary

Support

Support