적응형 가중치 거부 샘플링을 통한 언어 모델의 빠른 제어 생성

초록

어떤 제약 조건 하에서 언어 모델을 통해 생성하는 주요 접근 방식은 지역적 제약 디코딩(LCD)으로, 각 시간 단계에서 제약 조건을 위반하지 않는 토큰을 점진적으로 샘플링하는 방법입니다. 일반적으로 이는 토큰 마스킹을 통해 이루어지며, 어휘 집합을 순회하면서 조건에 맞지 않는 토큰을 제외합니다. 이 접근 방식에는 두 가지 중요한 문제가 있습니다. (i) 모든 토큰에 대해 제약 조건을 평가하는 것은 매우 비용이 많이 들 수 있습니다. 언어 모델의 어휘 집합은 종종 100,000개를 초과하기 때문입니다. (ii) LCD는 전역적인 문자열 분포를 왜곡시킬 수 있으며, 지역 정보만을 기반으로 토큰을 샘플링하기 때문에 막다른 경로로 이어질 가능성이 있습니다. 본 연구에서는 이러한 두 문제를 해결하는 새로운 알고리즘을 소개합니다. 첫째, 생성 과정의 각 단계에서 전체 어휘 집합에 대해 제약 조건을 평가하는 것을 피하기 위해, 일반적으로 훨씬 적은 수의 제약 평가만 필요한 적응적 거부 샘플링 알고리즘을 제안합니다. 둘째, 이 알고리즘을 확장하여 매우 적은 추가 비용으로 낮은 분산을 가진 편향되지 않은 중요도 가중치 추정치를 생성할 수 있음을 보여줍니다. 이 추정치는 이전에 제안된 순차적 몬테카를로 알고리즘 내에서 지역적 제약 적용의 근시안적 행동을 보정하는 데 안전하게 사용될 수 있습니다. 텍스트-to-SQL, 분자 합성, 목표 추론, 패턴 매칭, JSON 도메인 등에서의 광범위한 실험적 평가를 통해, 우리의 접근 방식이 최신 베이스라인보다 우수하며, 더 넓은 범주의 제약 조건을 지원하고 런타임과 성능을 모두 개선함을 보여줍니다. 추가적인 이론적 및 실험적 분석은 우리 방법의 런타임 효율성이 계산의 동적 사용에 의해 주도되며, 제약이 없는 언어 모델과 제약이 있는 언어 모델 간의 발산에 따라 확장됨을 보여줍니다. 결과적으로, 더 나은 모델일수록 런타임 개선 효과가 더 큽니다.

English

The dominant approach to generating from language models subject to some constraint is locally constrained decoding (LCD), incrementally sampling tokens at each time step such that the constraint is never violated. Typically, this is achieved through token masking: looping over the vocabulary and excluding non-conforming tokens. There are two important problems with this approach. (i) Evaluating the constraint on every token can be prohibitively expensive -- LM vocabularies often exceed 100,000 tokens. (ii) LCD can distort the global distribution over strings, sampling tokens based only on local information, even if they lead down dead-end paths. This work introduces a new algorithm that addresses both these problems. First, to avoid evaluating a constraint on the full vocabulary at each step of generation, we propose an adaptive rejection sampling algorithm that typically requires orders of magnitude fewer constraint evaluations. Second, we show how this algorithm can be extended to produce low-variance, unbiased estimates of importance weights at a very small additional cost -- estimates that can be soundly used within previously proposed sequential Monte Carlo algorithms to correct for the myopic behavior of local constraint enforcement. Through extensive empirical evaluation in text-to-SQL, molecular synthesis, goal inference, pattern matching, and JSON domains, we show that our approach is superior to state-of-the-art baselines, supporting a broader class of constraints and improving both runtime and performance. Additional theoretical and empirical analyses show that our method's runtime efficiency is driven by its dynamic use of computation, scaling with the divergence between the unconstrained and constrained LM, and as a consequence, runtime improvements are greater for better models.

적응형 가중치 거부 샘플링을 통한 언어 모델의 빠른 제어 생성

Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling

초록

Support