TroL:大型语言和视觉模型的层遍历
TroL: Traversal of Layers for Large Language and Vision Models
June 18, 2024
作者: Byung-Kwan Lee, Sangyun Chung, Chae Won Kim, Beomchan Park, Yong Man Ro
cs.AI
摘要
大型语言和视觉模型(LLVMs)受到大型语言模型(LLMs)的泛化能力以及视觉指导调整的推动。除了直接扩大规模外,这些模型使LLVMs能够通过自然语言指令涵盖各种任务,展示强大的视觉语言(VL)性能。然而,现有的开源LLVMs,如GPT-4V等性能相当的闭源LLVMs,通常被认为太大(例如26B、34B和110B参数),具有更多的层。这些大型模型需要昂贵的高端资源进行训练和推断。为了解决这个问题,我们提出了一种新的高效LLVM家族,具有1.8B、3.8B和7B的LLM模型大小,名为层遍历(TroL),它可以以令牌方式重复使用层。这种层遍历技术模拟了回顾和重追答案流的效果,同时增加了前向传播层的数量,而无需物理上添加更多层。我们证明TroL采用简单的层遍历方法,却能有效地胜过具有更大模型大小的开源LLVMs,并与具有实质大小的闭源LLVMs的性能相匹敌。
English
Large language and vision models (LLVMs) have been driven by the
generalization power of large language models (LLMs) and the advent of visual
instruction tuning. Along with scaling them up directly, these models enable
LLVMs to showcase powerful vision language (VL) performances by covering
diverse tasks via natural language instructions. However, existing open-source
LLVMs that perform comparably to closed-source LLVMs such as GPT-4V are often
considered too large (e.g., 26B, 34B, and 110B parameters), having a larger
number of layers. These large models demand costly, high-end resources for both
training and inference. To address this issue, we present a new efficient LLVM
family with 1.8B, 3.8B, and 7B LLM model sizes, Traversal of Layers (TroL),
which enables the reuse of layers in a token-wise manner. This layer traversing
technique simulates the effect of looking back and retracing the answering
stream while increasing the number of forward propagation layers without
physically adding more layers. We demonstrate that TroL employs a simple layer
traversing approach yet efficiently outperforms the open-source LLVMs with
larger model sizes and rivals the performances of the closed-source LLVMs with
substantial sizes.Summary
AI-Generated Summary