컨텍스트 한계를 넘어서: 장기적 추론을 위한 잠재적 연결 고리

초록

대규모 언어 모델(LLM)의 추론 정확도와 효율성을 저해하는 컨텍스트 한계를 극복하기 위해, 우리는 재귀적 및 분해적 문제 해결을 위해 훈련된 LLM 계열인 Thread Inference Model(TIM)과 컨텍스트 한계를 넘어 장기적 구조화된 추론을 가능하게 하는 추론 런타임인 TIMRUN을 제안한다. TIMRUN에 호스팅된 TIM은 단일 언어 모델 추론 내에서 사실상 무제한의 작업 메모리와 다중 홉 도구 호출을 지원함으로써 출력 한계, 위치 임베딩 제약, GPU 메모리 병목 현상을 극복한다. 이 성능은 자연어를 길이와 깊이로 측정된 추론 트리로 모델링함으로써 달성되며, 이는 선형 시퀀스 대신 사용된다. 추론 트리는 우리가 Schroeder et al, 2025에서 제안한 개념을 기반으로 한 작업, 재귀적 하위 작업, 결론으로 구성된다. 생성 과정에서 우리는 규칙 기반 하위 작업 가지치기 메커니즘에 의해 선택된 가장 관련성 높은 컨텍스트 토큰의 키-값 상태만을 유지하는 작업 메모리를 유지함으로써, 위치 임베딩과 GPU 메모리 페이지를 추론 전반에 걸쳐 재사용할 수 있게 한다. 실험 결과는 우리 시스템이 GPU 메모리에서 KV 캐시의 최대 90%를 조작하는 경우에도 높은 추론 처리량을 유지하며, 수학적 작업에서 정확한 추론을 제공하고 장기적 추론과 다중 홉 도구 사용이 필요한 정보 검색 과제를 처리할 수 있음을 보여준다.

English

To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that retains only the key-value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use.

컨텍스트 한계를 넘어서: 장기적 추론을 위한 잠재적 연결 고리

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

초록

Support