주사위를 굴리고 뛰기 전에 살펴보라: 다음 토큰 예측의 창의적 한계를 넘어서기

초록

우리는 현실 세계의 개방형 과제를 대략적으로 추상화한 최소한의 알고리즘 과제 세트를 설계했습니다. 이를 통해 현대 언어 모델의 창의적 한계를 깔끔하고 통제 가능한 방식으로 정량화할 수 있습니다. 창의적이고 장기적인 사고 도약을 요구하는 현실 세계의 과제와 마찬가지로, 우리의 과제는 암묵적이고 개방형인 확률적 계획 단계를 필요로 합니다. 이 단계는 (a) 추상적인 지식 그래프에서 새로운 연결을 발견하거나(말장난, 유추, 연구 등에서와 같이) (b) 새로운 패턴을 구성하는(수학 문제 설계나 새로운 단백질 설계 등에서와 같이) 것을 포함합니다. 이러한 과제에서 우리는 다음 토큰 학습이 근시안적이며 과도하게 기억에 의존한다는 점을 경험적 및 개념적으로 논증합니다. 이에 비해 다중 토큰 접근법, 즉 교사 없는 학습과 확산 모델은 다양하고 독창적인 출력을 생성하는 데 뛰어납니다. 둘째, 우리의 과제에서 Transformer로부터 일관성을 해치지 않으면서 무작위성을 이끌어내기 위해서는 출력층에서의 온도 샘플링에 의존하기보다는 입력층에서 직접 노이즈를 주입하는(우리가 '해시 조건화'라고 명명한 방법) 것이 더 효과적임을 발견했습니다. 따라서 우리의 연구는 개방형 창의적 능력을 분석하기 위한 원칙적이고 최소한의 테스트베드를 제공하며, 다음 토큰 학습과 소프트맥스 기반 샘플링을 넘어서는 새로운 논거를 제시합니다. 우리는 코드 일부를 https://github.com/chenwu98/algorithmic-creativity에서 공개합니다.

English

We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended stochastic planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic and memorizes excessively; comparatively, multi-token approaches, namely teacherless training and diffusion models, excel in producing diverse and original output. Secondly, in our tasks, we find that to elicit randomness from the Transformer without hurting coherence, it is better to inject noise right at the input layer (via a method we dub hash-conditioning) rather than defer to temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and softmax-based sampling. We make part of the code available under https://github.com/chenwu98/algorithmic-creativity

주사위를 굴리고 뛰기 전에 살펴보라: 다음 토큰 예측의 창의적 한계를 넘어서기

Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction

초록

Support