제약 이전의 사고: 대규모 언어 모델을 위한 통합 디코딩 프레임워크

초록

자연 생성(Natural generation)은 대규모 언어 모델(LLM)이 풍부한 추론을 바탕으로 자유로운 형태의 응답을 생성할 수 있게 하지만, 구조가 부족하여 출력 결과를 검증하기 어렵다. 반면, 제약 디코딩(Constrained decoding)은 표준화된 형식을 보장하지만, 생성 과정 초기에 제약을 부과하여 의도치 않게 추론 능력을 제한할 수 있다. 본 논문에서는 단일 호출로 자유로운 형태의 추론과 구조적 생성을 결합한 하이브리드 접근법인 In-Writing을 제안한다. 모델은 먼저 제약 없는 추론을 수행하고, 트리거 토큰(trigger token)이 생성된 후에만 구조적 디코딩을 적용하여 추론과 형식화를 명시적으로 분리한다. 우리는 트리거 토큰 전략이 제약 디코딩이 진행 중인 추론을 중단시키는 오류 모드인 조기 트리거링을 사실상 근절할 수 있음을 입증한다. 분류 및 추론 과제를 포함한 다양한 데이터셋에 대한 평가 결과, 우리의 접근 방식이 자연 생성 대비 최대 27%의 정확도 향상을 달성하여 최첨단 성능을 능가함을 보여준다. 코드는 다음에서 확인할 수 있다: https://github.com/Nokia-Bell-Labs/InWriting.

English

Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.