Treinamento em Tempo de Teste no Local

Resumo

O paradigma estático de "treinar e depois implantar" limita fundamentalmente a capacidade dos Modelos de Linguagem de Grande Escala (LLMs) de adaptar dinamicamente os seus pesos em resposta a fluxos contínuos de novas informações inerentes a tarefas do mundo real. O Treino em Tempo de Teste (TTT) oferece uma alternativa convincente, atualizando um subconjunto de parâmetros do modelo (pesos rápidos) durante a inferência. No entanto, o seu potencial no atual ecossistema de LLMs é limitado por barreiras críticas, incluindo incompatibilidade arquitetónica, ineficiência computacional e objetivos desalinhados para os pesos rápidos na modelagem de linguagem. Neste trabalho, introduzimos o Treino em Tempo de Teste *In-Place* (In-Place TTT), uma estrutura que confere de forma integrada aos LLMs a capacidade de Treino em Tempo de Teste. O In-Place TTT trata a matriz de projeção final dos ubíquos blocos MLP como os seus pesos rápidos adaptáveis, permitindo uma melhoria "plug-and-play" para LLMs sem o custoso retreino a partir do zero. Além disso, substituímos o objetivo genérico de reconstrução do TTT por um objetivo específico, teoricamente fundamentado e explicitamente alinhado com a tarefa de Previsão do Próximo Token que rege a modelagem de linguagem autoregressiva. Este objetivo fundamentado, combinado com um mecanismo eficiente de atualização por blocos (*chunks*), resulta num algoritmo altamente escalável compatível com o paralelismo de contexto. Experiências extensivas validam a eficácia da nossa estrutura: como uma melhoria *in-place*, permite que um modelo com 4B de parâmetros atinja um desempenho superior em tarefas com contextos de até 128k tokens, e, quando pré-treinado a partir do zero, supera consistentemente abordagens concorrentes relacionadas com TTT. Os resultados do estudo de ablação fornecem ainda insights mais profundos sobre as nossas opções de design. Coletivamente, os nossos resultados estabelecem o In-Place TTT como um passo promissor em direção a um paradigma de aprendizagem contínua em LLMs.

English

The static ``train then deploy" paradigm fundamentally limits Large Language Models (LLMs) from dynamically adapting their weights in response to continuous streams of new information inherent in real-world tasks. Test-Time Training (TTT) offers a compelling alternative by updating a subset of model parameters (fast weights) at inference time, yet its potential in the current LLM ecosystem is hindered by critical barriers including architectural incompatibility, computational inefficiency and misaligned fast weight objectives for language modeling. In this work, we introduce In-Place Test-Time Training (In-Place TTT), a framework that seamlessly endows LLMs with Test-Time Training ability. In-Place TTT treats the final projection matrix of the ubiquitous MLP blocks as its adaptable fast weights, enabling a ``drop-in" enhancement for LLMs without costly retraining from scratch. Furthermore, we replace TTT's generic reconstruction objective with a tailored, theoretically-grounded objective explicitly aligned with the Next-Token-Prediction task governing autoregressive language modeling. This principled objective, combined with an efficient chunk-wise update mechanism, results in a highly scalable algorithm compatible with context parallelism. Extensive experiments validate our framework's effectiveness: as an in-place enhancement, it enables a 4B-parameter model to achieve superior performance on tasks with contexts up to 128k, and when pretrained from scratch, it consistently outperforms competitive TTT-related approaches. Ablation study results further provide deeper insights on our design choices. Collectively, our results establish In-Place TTT as a promising step towards a paradigm of continual learning in LLMs.