Reka核心、Flash和Edge：一系列强大的多模态语言模型

摘要

我们介绍了Reka Core、Flash和Edge，这是由Reka从头开始训练的一系列强大的多模态语言模型。Reka模型能够处理和推理文本、图像、视频和音频输入。本技术报告讨论了训练其中一些模型的细节，并提供了全面的评估结果。我们展示了Reka Edge和Reka Flash不仅是最先进的，而且胜过许多更大的模型，在各自的计算类别中提供了超额价值。与此同时，我们最强大的模型Reka Core在自动评估和盲目人类评估方面接近最佳前沿模型。在图像问答基准测试（例如MMMU、VQAv2）上，Core的表现与GPT4-V相媲美。同时，在多模态聊天中，Core在盲目第三方人类评估设置下排名第二，胜过其他模型，如Claude 3 Opus。在文本基准测试中，Core不仅在一系列公认的基准测试（例如MMLU、GSM8K）上与其他前沿模型竞争，而且在人类评估上胜过GPT4-0613。在视频问答（Perception-Test）中，Core胜过Gemini Ultra。模型已在生产中部署，网址为http://chat.reka.ai。您还可以在http://showcase.reka.ai找到一些非精心挑选的定性示例。

English

We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai .

Reka核心、Flash和Edge：一系列强大的多模态语言模型

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

摘要

Support