이산 토큰 샘플링을 넘어서는 텍스트 생성

초록

표준 자기회귀 생성 방식에서, 대형 언어 모델(LLM)은 다음 토큰의 분포를 예측하고, 이산 토큰을 샘플링한 후, 해당 분포를 버리고 샘플링된 토큰만을 새로운 입력으로 전달합니다. 이 분포의 풍부한 정보를 보존하기 위해, 우리는 훈련이 필요 없는 자기회귀 생성 방법인 입력 혼합(Mixture of Inputs, MoI)을 제안합니다. 표준 패러다임에 따라 토큰을 생성한 후, 생성된 이산 토큰과 이전에 버려진 토큰 분포를 혼합한 새로운 입력을 구성합니다. 구체적으로, 우리는 토큰 분포를 사전 분포로, 샘플링된 토큰을 관측값으로 취급하고, 기존의 원-핫 벡터를 연속적인 사후 기대값으로 대체하여 새로운 모델 입력으로 사용하는 베이지안 추정 방법을 적용합니다. MoI는 모델이 생성 과정 전반에 걸쳐 더 풍부한 내부 표현을 유지할 수 있게 하여, 텍스트 품질과 추론 능력을 향상시킵니다. 수학적 추론, 코드 생성, 박사 수준의 질의응답 작업에서 MoI는 QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, DAPO-Qwen-32B 등 여러 모델에서 추가 훈련 없이도 일관된 성능 향상을 보여주며, 계산 오버헤드도 무시할 수준입니다.

English

In standard autoregressive generation, an LLM predicts the next-token distribution, samples a discrete token, and then discards the distribution, passing only the sampled token as new input. To preserve this distribution's rich information, we propose Mixture of Inputs (MoI), a training-free method for autoregressive generation. After generating a token following the standard paradigm, we construct a new input that blends the generated discrete token with the previously discarded token distribution. Specifically, we employ a Bayesian estimation method that treats the token distribution as the prior, the sampled token as the observation, and replaces the conventional one-hot vector with the continuous posterior expectation as the new model input. MoI allows the model to maintain a richer internal representation throughout the generation process, resulting in improved text quality and reasoning capabilities. On mathematical reasoning, code generation, and PhD-level QA tasks, MoI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen-32B, with no additional training and negligible computational overhead.

이산 토큰 샘플링을 넘어서는 텍스트 생성

Text Generation Beyond Discrete Token Sampling

초록

Support