동결된 대규모 언어 모델을 위한 학습 증거 하이라이트

초록

대규모 언어 모델(LLM)은 우수한 추론 능력을 보유하고 있으나, 결정적인 증거가 길고 잡음이 많은 맥락에 묻혀 있을 경우 이를 종종 놓칩니다. 본 연구에서는 동결(frozen) 상태의 LLM 솔버를 위해 증거 선택과 추론 과정을 분리하는 증거 강조 프레임워크인 HiLight를 소개합니다. HiLight는 증거를 누락하거나 왜곡할 수 있는 입력 압축이나 재작성을 피하고, 경량의 강조 행위자(Emphasis Actor)를 훈련시켜 변경되지 않은 맥락 내 핵심 단락 주변에 최소한의 하이라이트 태그를 삽입합니다. 이후 동결 상태의 솔버(Solver)는 강조된 입력을 바탕으로 하류(downstream) 추론을 수행합니다. 우리는 하이라이트 작업을 약한 감독(weakly supervised) 의사 결정 문제로 규정하고, 증거 레이블이나 솔버에 대한 접근 및 수정 없이 오직 솔버의 과제 보상만을 사용한 강화 학습을 통해 행위자를 최적화합니다. 순차적 추천 및 장문 맥락 질의응답 작업에서 HiLight는 강력한 프롬프트 기반 및 자동화된 프롬프트 최적화 기준선(baselines)을 꾸준히 능가하는 성능 향상을 보여줍니다. 학습된 강조 정책(policy)은 API 기반 솔버를 포함하여 더 작거나 더 큰 보이지 않는(unseen) 솔버 패밀리로의 제로-샷(zero-shot) 전이가 가능하며, 이는 행위자가 단일 백본에 과적합하기보다는 진정한 재사용 가능한 증거 구조를 포착함을 시사합니다.

English

Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or distort evidence, by training a lightweight Emphasis Actor to insert minimal highlight tags around pivotal spans in the unaltered context. A frozen Solver then performs downstream reasoning on the emphasized input. We cast highlighting as a weakly supervised decision-making problem and optimize the Actor with reinforcement learning using only the Solver's task reward, requiring no evidence labels and no access to or modification of the Solver. Across sequential recommendation and long-context question answering, HiLight consistently improves performance over strong prompt-based and automated prompt-optimization baselines. The learned emphasis policy transfers zero-shot to both smaller and larger unseen Solver families, including an API-based Solver, suggesting that the Actor captures genuine, reusable evidence structure rather than overfitting to a single backbone.

동결된 대규모 언어 모델을 위한 학습 증거 하이라이트

Learning Evidence Highlighting for Frozen LLMs

초록

Support