Jan-nano 기술 보고서

초록

대부분의 언어 모델은 강력한 능력을 발휘하기 위해 상당한 계산 자원이 필요하다는 근본적인 딜레마에 직면해 있습니다. 우리는 이러한 제약을 깨뜨린 Jan-nano를 통해 효율성을 재정의했습니다. Jan-nano는 40억 개의 파라미터를 가진 언어 모델로, 모든 것을 알려고 하기보다는 무엇이든 즉시 찾아내는 기술에 특화함으로써 혁신적인 접근 방식을 보여줍니다. Qwen3-4B를 기반으로 우리의 독창적인 다단계 RLVR 시스템을 통해 미세 조정된 Jan-nano는 다음 토큰 예측 학습(SFT)에 대한 의존성을 완전히 제거했습니다. 이를 통해 Jan-nano는 소비자용 하드웨어에서 실행되면서도 MCP 통합 시 SimpleQA 벤치마크에서 83.2%의 성능을 달성했습니다. 128K의 컨텍스트 길이를 갖춘 Jan-nano는 지능이 규모가 아니라 전략에 달려 있음을 증명합니다.

English

Most language models face a fundamental tradeoff where powerful capabilities require substantial computational resources. We shatter this constraint with Jan-nano, a 4B parameter language model that redefines efficiency through radical specialization: instead of trying to know everything, it masters the art of finding anything instantly. Fine-tuned from Qwen3-4B using our novel multi-stage RLVR system that completely eliminates reliance on next token prediction training (SFT), Jan-nano achieves 83.2% on SimpleQA benchmark with MCP integration while running on consumer hardware. With 128K context length, Jan-nano proves that intelligence isn't about scale, it's about strategy.

Jan-nano 기술 보고서

Jan-nano Technical Report

초록

Support