실행 가능한 기능적 추상화: 고급 수학 문제를 위한 생성적 프로그램 추론

초록

과학자들은 종종 특정 문제 사례에서 추상적인 절차를 추론하고, 이러한 추상화를 사용하여 새로운 관련 사례를 생성합니다. 예를 들어, 시스템의 공식 규칙과 속성을 인코딩한 프로그램은 RL(절차적 환경)부터 물리학(시뮬레이션 엔진)에 이르는 다양한 분야에서 유용하게 사용되어 왔습니다. 이러한 프로그램은 매개변수화(예: 그리드월드 구성 또는 초기 물리적 조건)에 따라 다양한 출력을 실행하는 함수로 볼 수 있습니다. 우리는 수학 문제에 대해 이러한 프로그램을 지칭하기 위해 EFA(Executable Functional Abstraction)라는 용어를 도입합니다. EFA와 유사한 구조는 모델을 스트레스 테스트하기 위한 문제 생성기로서 수학적 추론에 유용한 것으로 입증되었습니다. 그러나 기존 연구는 초등학교 수준의 수학(단순한 규칙을 프로그램으로 쉽게 인코딩할 수 있음)에 대한 추상화에 국한되었으며, 고급 수학에 대한 EFA 생성은 지금까지 인간의 엔지니어링이 필요했습니다. 우리는 고급 수학 문제에 대한 EFA의 자동 구성을 탐구합니다. 우리는 EFA의 자동 구성을 프로그램 합성 작업으로 구체화하고, LLM(Large Language Model)을 시드 수학 문제와 그 단계별 해결책에 조건화하여 시드 문제의 일반화된 문제 및 해결책 클래스에 충실한 후보 EFA 프로그램을 생성하는 EFAGen을 개발합니다. 또한, 우리는 유효한 EFA가 반드시 가져야 할 속성을 실행 가능한 단위 테스트의 관점에서 공식화하고, 이러한 테스트가 검증 가능한 보상으로 사용되어 LLM이 더 나은 EFA 작성자가 되도록 훈련시킬 수 있음을 보여줍니다. 우리는 EFAGen에 의해 구성된 EFA가 시드 문제에 충실하게 행동하고, 학습 가능한 문제 변형을 생성하며, EFAGen이 다양한 경쟁 수준의 수학 문제 출처에서 EFA를 추론할 수 있음을 입증합니다. 마지막으로, 모델이 작성한 EFA의 다운스트림 활용 사례를 보여줍니다. 예를 들어, 학습자가 해결하기 더 어렵거나 쉬운 문제 변형을 찾는 것과 데이터 생성 등이 있습니다.

English

Scientists often infer abstract procedures from specific instances of problems and use the abstractions to generate new, related instances. For example, programs encoding the formal rules and properties of a system have been useful in fields ranging from RL (procedural environments) to physics (simulation engines). These programs can be seen as functions which execute to different outputs based on their parameterizations (e.g., gridworld configuration or initial physical conditions). We introduce the term EFA (Executable Functional Abstraction) to denote such programs for math problems. EFA-like constructs have been shown to be useful for math reasoning as problem generators for stress-testing models. However, prior work has been limited to abstractions for grade-school math (whose simple rules are easy to encode in programs), while generating EFAs for advanced math has thus far required human engineering. We explore the automatic construction of EFAs for advanced math problems. We operationalize the task of automatically constructing EFAs as a program synthesis task, and develop EFAGen, which conditions an LLM on a seed math problem and its step-by-step solution to generate candidate EFA programs that are faithful to the generalized problem and solution class underlying the seed problem. Furthermore, we formalize properties any valid EFA must possess in terms of executable unit tests, and show how the tests can be used as verifiable rewards to train LLMs to become better writers of EFAs. We demonstrate that EFAs constructed by EFAGen behave rationally by remaining faithful to seed problems, produce learnable problem variations, and that EFAGen can infer EFAs across multiple diverse sources of competition-level math problems. Finally, we show downstream uses of model-written EFAs e.g. finding problem variations that are harder or easier for a learner to solve, as well as data generation.

실행 가능한 기능적 추상화: 고급 수학 문제를 위한 생성적 프로그램 추론

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems

초록

Support