ChatPaper.aiChatPaper

RecurrentGemma:超越Transformer以實現高效的開放式語言模型

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

April 11, 2024
作者: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas
cs.AI

摘要

我們介紹了 RecurrentGemma,這是一個使用 Google 的新型 Griffin 架構的開放式語言模型。Griffin 將線性循環結合本地注意力,以在語言任務上取得優異表現。它具有固定大小的狀態,降低了記憶體使用量,並能夠有效地處理長序列的推論。我們提供了一個預先訓練的模型,具有 20 億個非嵌入參數,以及一個經過調整的變體。儘管這兩個模型訓練時使用的標記比 Gemma-2B 少,但它們都實現了與 Gemma-2B 相當的性能。
English
We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens.

Summary

AI-Generated Summary

PDF482December 15, 2024