OpenSearch-VL: 최첨단 멀티모달 검색 에이전트를 위한 오픈 레시피

초록

딥 서치는 프론티어 멀티모달 에이전트의 핵심 능력으로 자리 잡으며, 모델이 능동적 탐색, 증거 검증, 다단계 추론을 통해 복잡한 질문을 해결할 수 있게 합니다. 급속한 발전에도 불구하고, 최상위 멀티모달 검색 에이전트는 공개된 고품질 학습 데이터, 투명한 추적 경로 합성 파이프라인 또는 상세한 학습 레시피의 부재로 인해 재현하기 어려운 상황입니다. 이를 위해 우리는 에이전트 강화 학습을 통해 프론티어 멀티모달 딥 서치 에이전트를 학습시키기 위한 완전 오픈소스 레시피인 OpenSearch-VL을 소개합니다. 먼저, 위키백과 경로 샘플링, 퍼지 엔터티 재작성, 소스-앵커 시각적 그라운딩을 통해 고품질 학습 데이터를 구축하기 위한 전용 파이프라인을 구축하여 단축 경로와 단일 단계 검색 붕괴를 함께 줄였습니다. 이 파이프라인을 기반으로 SFT를 위한 SearchVL-SFT-36k와 RL을 위한 SearchVL-RL-8k, 두 가지 학습 데이터셋을 정제했습니다. 또한, 텍스트 검색, 이미지 검색, OCR, 크롭핑, 선명도 보정, 초해상도, 원근법 보정을 통합한 다양한 도구 환경을 설계하여 에이전트가 능동적 인지와 외부 지식 습득을 결합할 수 있게 했습니다. 마지막으로, 실패 후 토큰을 마스킹하면서 일방적 이점 클램핑을 통해 실패 전 유용한 추론을 보존하는 방식으로 연쇄적 도구 실패를 처리하는 다중 턴 치명적 오류 인식 GRPO 학습 알고리즘을 제안합니다. 이 레시피를 기반으로 구축된 OpenSearch-VL은 7개 벤치마크에서 평균 10점 이상의 상당한 성능 향상을 제공하며, 여러 작업에서 독점 상용 모델과 비슷한 결과를 달성합니다. 우리는 멀티모달 딥 서치 에이전트에 대한 공개 연구를 지원하기 위해 모든 데이터, 코드 및 모델을 공개할 것입니다.

English

Deep search has become a crucial capability for frontier multimodal agents, enabling models to solve complex questions through active search, evidence verification, and multi-step reasoning. Despite rapid progress, top-tier multimodal search agents remain difficult to reproduce, largely due to the absence of open high-quality training data, transparent trajectory synthesis pipelines, or detailed training recipes. To this end, we introduce OpenSearch-VL, a fully open-source recipe for training frontier multimodal deep search agents with agentic reinforcement learning. First, we curated a dedicated pipeline to construct high-quality training data through Wikipedia path sampling, fuzzy entity rewriting, and source-anchor visual grounding, which jointly reduce shortcuts and one-step retrieval collapse. Based on this pipeline, we curate two training datasets, SearchVL-SFT-36k for SFT and SearchVL-RL-8k for RL. Besides, we design a diverse tool environment that unifies text search, image search, OCR, cropping, sharpening, super-resolution, and perspective correction, enabling agents to combine active perception with external knowledge acquisition. Finally, we propose a multi-turn fatal-aware GRPO training algorithm that handles cascading tool failures by masking post-failure tokens while preserving useful pre-failure reasoning through one-sided advantage clamping. Built on this recipe, OpenSearch-VL delivers substantial performance gains, with over 10-point average improvements across seven benchmarks, and achieves results comparable to proprietary commercial models on several tasks. We will release all data, code, and models to support open research on multimodal deep search agents.

OpenSearch-VL: 최첨단 멀티모달 검색 에이전트를 위한 오픈 레시피

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

초록

Support