TIDE: 템플릿 기반 반복을 통한 사전 예방적 다중 문제 발견

초록

에이전트는 문서, 도구 및 코드에 대한 어시스턴트로 널리 배포되고 있다. 그러나 이들은 일반적으로 사용자가 인지한 문제만을 표면화하는 명시적 사용자 요청에만 반응하는 반면, 더 넓은 사용자 맥락 내에서 눈에 띄게 숨겨져 있는 많은 다른 중요한 문제들이 공존하며, 그 총 개수는 사전에 알려져 있지 않다. 우리는 이를 맥락에서 여러 숨겨진 문제를 발견하는 과제로 구성하며, 여기서 공존하는 문제들은 드러나고, 뒷받침 증거에 근거하며, 구체적인 조치와 연결되어야 한다. 이를 위해 우리는 두 가지 상호 보완적 메커니즘을 갖춘 템플릿 기반 반복 프레임워크인 TIDE를 소개한다. 구체적으로, 단일 패스 예측이 가장 두드러진 사례에 집중하여 일반적인 주장을 생성한다는 관찰에서 동기 부여되어, 우리는 반복적 발견(iterative discovery)을 제안한다. 이는 이미 발견된 내용을 조건으로 하면서 라운드당 소량의 후보군을 표면화하여, 후속 라운드가 범위를 확장하도록 한다. 또한 사고 템플릿(thought templates)은 이전에 해결된 사례에서 추출된 재사용 가능한 스키마로, 어떤 맥락적 신호에 주목하고 이를 어떻게 연결할지 명시하여 각 예측을 인식 가능한 문제 클래스에 고정시킨다. 우리는 TIDE를 개인 작업 공간과 소프트웨어 저장소라는 두 가지 현실적인 환경에서 네 가지 모델 백본에 걸쳐 검증하였으며, 단일 샷 및 병렬 멀티 에이전트 기준선 대비 작업 범위, 식별 및 해결 측면에서 상당한 성능 향상을 보여준다.

English

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their total number unknown in advance. We frame this as the task of discovering multiple hidden problems from context, in which coexisting problems should be uncovered, grounded in supporting evidence, and paired with concrete actions. To this end, we introduce TIDE, a template-guided iterative framework with two complementary mechanisms. Specifically, motivated by the observation that single-pass prediction anchors on the most salient cases and yields generic claims, we propose iterative discovery, which surfaces a small batch of candidates per round while conditioning on what has already been found, so subsequent rounds extend coverage; and thought templates, reusable schemas distilled from previously solved cases that specify what contextual signals to attend to and how to connect them, anchoring each prediction in a recognizable problem class. We validate TIDE on two realistic settings, personal workspaces and software repositories, across four model backbones, showing substantial gains over single-shot and parallel multi-agent baselines on task coverage, identification, and resolution.