연구 하네스를 통한 AI 과학자의 연구 종합 및 검증 외부화

초록

AI 시스템은 과학적 작업 흐름을 점점 더 자동화할 수 있지만, 이전 증거, 생성된 아이디어, 실험 및 최종 주장을 연결하는 추론은 종종 모델 추론 내에 암시적으로 남아 있습니다. 여기서 우리는 연구 합성과 실험 검증을 검사 가능하고 계약에 기반한 프로세스로 외부화하는 연구 하네스인 Xcientist를 소개합니다. Xcientist는 문헌 증거, 아이디어 상태, 구현 계획, 절제 기록 및 수정 추적을 지속적인 연구 인공물로 구성하여, 생성된 메커니즘이 증거 기반을 잃지 않으면서 근거를 확보하고, 실행하고, 테스트하고, 수정할 수 있도록 합니다. 우리는 실행 가능한 인공물이 더 이상 원래 주장된 메커니즘을 지원하지 않는, 자동화된 연구의 실패 모드로서 주장 표류를 식별합니다. 학습 없는 메모리 시스템, 그래프 구조 교통 예측 및 다중 스케일 물리 정보 신경망에 걸쳐, Xcientist는 문제 공식화에서 메커니즘 설계, 검증 및 제한된 수정까지 추적 가능한 궤적을 보존합니다. 이러한 결과는 AI 과학자가 최종 인공물뿐만 아니라 합성 및 검증 프로세스가 귀속 가능하고, 검사 가능하며, 과학적으로 책임질 수 있는지 여부에 의해 평가되어야 함을 시사합니다.

English

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts, so that generated mechanisms can be grounded, executed, tested and revised without losing their evidential basis. We identify claim drift as a failure mode of automated research, where runnable artifacts no longer support the mechanism originally claimed. Across training-free memory systems, graph-structured traffic forecasting and multi-scale physics-informed neural networks, Xcientist preserves traceable trajectories from problem formulation to mechanism design, validation and bounded revision. These results suggest that AI scientists should be evaluated not only by their final artifacts, but by whether their synthesis and validation processes remain attributable, inspectable and scientifically accountable.