ChatPaper.aiChatPaper

Reka Core、Flash 和 Edge:一系列強大的多模式語言模型

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

April 18, 2024
作者: Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie
cs.AI

摘要

我們介紹了Reka Core、Flash和Edge,這是由Reka從頭開始訓練的一系列強大的多模態語言模型。Reka模型能夠處理和推理文本、圖像、視頻和音頻輸入。本技術報告討論了訓練這些模型的細節,並提供了全面的評估結果。我們展示了Reka Edge和Reka Flash不僅是最先進的,而且在性能上超越了許多更大的模型,在各自的計算類別中提供了超出預期的價值。與此同時,我們最強大的模型Reka Core在自動評估和盲目人工評估方面接近最佳前沿模型。在圖像問答基準測試(例如MMMU、VQAv2)中,Core的表現與GPT4-V相媲美。在多模態對話中,Core在盲目第三方人工評估設置下排名第二,優於其他模型,如Claude 3 Opus。在文本基準測試中,Core不僅在一組成熟基準測試(例如MMLU、GSM8K)上與其他前沿模型競爭,還在人工評估方面超越了GPT4-0613。在視頻問答(Perception-Test)中,Core超越了Gemini Ultra。這些模型已在http://chat.reka.ai 上投入生產。您也可以在http://showcase.reka.ai 上找到一些非選擇性的優質示例展示。
English
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai .

Summary

AI-Generated Summary

PDF401December 15, 2024