하이퍼에이전트

초록

자기개선 AI 시스템은 자체 학습 및 문제 해결 과정을 개선하는 방법을 학습함으로써 인간 공학에 대한 의존도를 줄이는 것을 목표로 합니다. 기존 자기개선 접근법은 고정된 수작업 메타 수준 메커니즘에 의존하여, 이러한 시스템의 개선 속도를 근본적으로 제한합니다. 다윈 괴델 머신(DGM)은 자체 수정된 변형을 반복적으로 생성하고 평가함으로써 코딩 분야에서 개방형 자기개선을 보여줍니다. 평가와 자체 수정이 모두 코딩 작업이기 때문에, 코딩 능력의 향상이 자기개선 능력의 향상으로 이어질 수 있습니다. 그러나 이러한 정렬은 일반적으로 코딩 도메인을 벗어나서는 성립하지 않습니다. 우리는 하이퍼에이전트(hyperagent)를 소개합니다. 하이퍼에이전트는 작업 에이전트(목표 작업 해결)와 메타 에이전트(자기 자신과 작업 에이전트 수정)를 단일 편집 가능 프로그램으로 통합하는 자기 참조 에이전트입니다. 중요한 것은 메타 수준 수정 절차 자체가 편집 가능하여 인지적 자기 수정(metacognitive self-modification)이 가능해지며, 이는 작업 해결 행동뿐만 아니라 향후 개선을 생성하는 메커니즘까지 개선합니다. 우리는 DGM을 확장하여 DGM-하이퍼에이전트(DGM-H)를 생성함으로써 이 프레임워크를 구현하며, 작업 성능과 자기 수정 기술 간의 도메인 특화적 정렬에 대한 가정을 제거하여 모든 계산 가능한 작업에서 자기 가속적 진전을 잠재적으로 지원할 수 있도록 합니다. 다양한 도메인에서 DGM-H는 시간이 지남에 따라 성능을 개선하며, 자기개선이나 개방형 탐색이 없는 베이스라인과 기존 자기개선 시스템을 능가합니다. 더 나아가, DGM-H는 새로운 에이전트를 생성하는 과정(예: 지속적 메모리, 성능 추적)을 개선하며, 이러한 메타 수준 개선은 도메인 간에 전이되고 실행에 걸쳐 누적됩니다. DGM-하이퍼에이전트는 단순히 더 나은 해결책을 탐색하는 것을 넘어, 개선 방법을 탐색하는 과정 자체를 지속적으로 개선하는 개방형 AI 시스템의 가능성을 엿보게 합니다.

English

Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such systems can improve. The Darwin Gödel Machine (DGM) demonstrates open-ended self-improvement in coding by repeatedly generating and evaluating self-modified variants. Because both evaluation and self-modification are coding tasks, gains in coding ability can translate into gains in self-improvement ability. However, this alignment does not generally hold beyond coding domains. We introduce hyperagents, self-referential agents that integrate a task agent (which solves the target task) and a meta agent (which modifies itself and the task agent) into a single editable program. Crucially, the meta-level modification procedure is itself editable, enabling metacognitive self-modification, improving not only the task-solving behavior, but also the mechanism that generates future improvements. We instantiate this framework by extending DGM to create DGM-Hyperagents (DGM-H), eliminating the assumption of domain-specific alignment between task performance and self-modification skill to potentially support self-accelerating progress on any computable task. Across diverse domains, the DGM-H improves performance over time and outperforms baselines without self-improvement or open-ended exploration, as well as prior self-improving systems. Furthermore, the DGM-H improves the process by which it generates new agents (e.g., persistent memory, performance tracking), and these meta-level improvements transfer across domains and accumulate across runs. DGM-Hyperagents offer a glimpse of open-ended AI systems that do not merely search for better solutions, but continually improve their search for how to improve.