Reka核心、Flash和Edge:一系列强大的多模态语言模型
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
April 18, 2024
作者: Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie
cs.AI
摘要
我们介绍了Reka Core、Flash和Edge,这是由Reka从头开始训练的一系列强大的多模态语言模型。Reka模型能够处理和推理文本、图像、视频和音频输入。本技术报告讨论了训练其中一些模型的细节,并提供了全面的评估结果。我们展示了Reka Edge和Reka Flash不仅是最先进的,而且胜过许多更大的模型,在各自的计算类别中提供了超额价值。与此同时,我们最强大的模型Reka Core在自动评估和盲目人类评估方面接近最佳前沿模型。在图像问答基准测试(例如MMMU、VQAv2)上,Core的表现与GPT4-V相媲美。同时,在多模态聊天中,Core在盲目第三方人类评估设置下排名第二,胜过其他模型,如Claude 3 Opus。在文本基准测试中,Core不仅在一系列公认的基准测试(例如MMLU、GSM8K)上与其他前沿模型竞争,而且在人类评估上胜过GPT4-0613。在视频问答(Perception-Test)中,Core胜过Gemini Ultra。模型已在生产中部署,网址为http://chat.reka.ai。您还可以在http://showcase.reka.ai找到一些非精心挑选的定性示例。
English
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal
language models trained from scratch by Reka. Reka models are able to process
and reason with text, images, video, and audio inputs. This technical report
discusses details of training some of these models and provides comprehensive
evaluation results. We show that Reka Edge and Reka Flash are not only
state-of-the-art but also outperform many much larger models, delivering
outsized values for their respective compute class. Meanwhile, our most capable
and largest model, Reka Core, approaches the best frontier models on both
automatic evaluations and blind human evaluations. On image question answering
benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V.
Meanwhile, on multimodal chat, Core ranks as the second most preferred model
under a blind third-party human evaluation setup, outperforming other models
such as Claude 3 Opus. On text benchmarks, Core not only performs competitively
to other frontier models on a set of well-established benchmarks (e.g. MMLU,
GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question
answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped
in production at http://chat.reka.ai . A showcase of non cherry picked
qualitative examples can also be found at http://showcase.reka.ai .Summary
AI-Generated Summary