Enseñando a Modelos de Lenguaje Preentrenados a Pensar más Profundamente con Recurrencia Retrofiteada

Resumen

Los avances recientes en modelos lingüísticos de profundidad recurrente demuestran que la recurrencia puede desacoplar el cómputo durante el entrenamiento y el número de parámetros del cómputo durante la prueba. En este trabajo, estudiamos cómo convertir modelos lingüísticos preentrenados no recurrentes existentes en modelos de profundidad recurrente. Descubrimos que utilizar un currículum de recurrencias para aumentar la profundidad efectiva del modelo durante el entrenamiento preserva el rendimiento mientras reduce el coste computacional total. En nuestros experimentos sobre matemáticas, observamos que convertir modelos preentrenados en recurrentes produce un mejor rendimiento con un presupuesto de cómputo determinado que simplemente realizar post-entrenamiento en el modelo lingüístico no recurrente original.

English

Recent advances in depth-recurrent language models show that recurrence can decouple train-time compute and parameter count from test-time compute. In this work, we study how to convert existing pretrained non-recurrent language models into depth-recurrent models. We find that using a curriculum of recurrences to increase the effective depth of the model over the course of training preserves performance while reducing total computational cost. In our experiments, on mathematics, we observe that converting pretrained models to recurrent ones results in better performance at a given compute budget than simply post-training the original non-recurrent language model.

Enseñando a Modelos de Lenguaje Preentrenados a Pensar más Profundamente con Recurrencia Retrofiteada

Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Resumen

Support