Ling and Ring 2.6 기술 보고서: 조 단위 파라미터 규모에서의 효율적이고 즉각적인 에이전트 지능

초록

효율적이고 확장 가능한 에이전트 기반 지능을 구현하려면, 모델이 낮은 지연 시간의 응답과 강력한 추론 능력을 동시에 제공하면서도 학습, 서빙, 배포 측면에서 실용적이어야 합니다. 본 보고서에서 우리는 이러한 과제를 대규모로 해결하기 위해 설계된 모델군인 Ling-2.6과 Ring-2.6을 제시합니다. Ling-2.6은 즉각적인 응답 생성 및 출력 토큰당 높은 성능에 최적화된 반면, Ring-2.6은 더 깊은 추론과 고급 에이전트 워크플로에 특화되어 있습니다. 우리는 처음부터 학습하는 대신, 아키텍처 마이그레이션 사전 학습과 대규모 사후 학습을 통해 Ling-2.0 기본 모델을 업그레이드했습니다. 이 업그레이드는 모델 아키텍처, 최적화 목표, 서빙 시스템, 에이전트 학습 환경의 통합적 공동 설계에 따라 진행되어, 모델 성능과 배포 효율성 모두에서 개선을 가능하게 했습니다. 아키텍처 수준에서 우리는 Lightning Attention과 MLA를 통합한 하이브리드 선형 어텐션 설계를 도입하여, 긴 문맥 학습 및 디코딩의 효율성을 향상시켰습니다. 토큰 효율성을 더욱 높이기 위해, 우리는 진화적 사고 사슬(Evolutionary Chain-of-Thought), 언어 단위 정책 최적화(Linguistic Unit Policy Optimization), 양방향 선호도 정렬, 최단 정답 응답 증류(shortest-correct-response distillation)를 통해 출력 토큰당 성능을 최적화했습니다. 에이전트 능력 측면에서, 우리는 대규모 환경 기반 데이터에서 Ring-2.6-1T의 안정적인 학습을 지원하도록 설계된 강화 학습 프레임워크인 KPop을 제안합니다. KPop은 코딩, 검색, 도구 사용, 워크플로 실행 전반에 걸친 비동기 스케줄링을 통해 학습 효율성을 개선하여, 복잡한 에이전트-환경 상호작용으로부터 확장 가능한 학습을 가능하게 합니다. Ling-2.6과 Ring-2.6은 함께 효율적이고 확장 가능하며 개방형 에이전트 시스템을 위한 실용적인 경로를 제공합니다. 우리는 2.6군의 모든 체크포인트를 오픈소스로 공개하여, 실용적인 에이전트 기반 지능에 대한 추가 연구 개발을 지원합니다.

English

Efficient and scalable agentic intelligence requires models that can deliver both low-latency responses and strong reasoning capabilities while remaining practical to train, serve, and deploy. In this report, we present Ling-2.6 and Ring-2.6, a family of models designed to address this challenge at scale. Ling-2.6 is optimized for instant response generation and high capability per output token, whereas Ring-2.6 is tailored for deeper reasoning and more advanced agentic workflows. Instead of training from scratch, we upgrade the Ling-2.0 base model through architectural migration pre-training and large-scale post-training. This upgrade is guided by a unified co-design of model architecture, optimization objectives, serving systems, and agent training environments, enabling improvements in both model capability and deployment efficiency. At the architectural level, we introduce a hybrid linear attention design that integrates Lightning Attention with MLA, improving the efficiency of long-context training and decoding. To further enhance token efficiency, we optimize capability per output token through Evolutionary Chain-of-Thought, Linguistic Unit Policy Optimization, bidirectional preference alignment, and shortest-correct-response distillation. For agentic capabilities, we propose KPop, a reinforcement learning framework designed to support stable training of Ring-2.6-1T on large-scale environment-grounded data. KPop improves training efficiency through asynchronous scheduling across coding, search, tool use, and workflow execution, enabling scalable learning from complex agent-environment interactions. Together, Ling-2.6 and Ring-2.6 provide a practical pathway toward efficient, scalable, and open agentic systems. We open-source all checkpoints in the 2.6 family to support further research and development in practical agentic intelligence.