Zien en Begrijpen: Visie Verbinden met Chemische Kennis via ChemVLM

Samenvatting

In dit technische rapport stellen we ChemVLM voor, het eerste open-source multimodale grote taalmodel dat specifiek is ontwikkeld voor de chemie, ontworpen om de onverenigbaarheid tussen het begrijpen van chemische afbeeldingen en tekstanalyse aan te pakken. Gebaseerd op de VIT-MLP-LLM-architectuur, maken we gebruik van ChemLLM-20B als het fundamentele grote model, waardoor ons model robuuste mogelijkheden krijgt in het begrijpen en toepassen van chemische tekstkennis. Daarnaast gebruiken we InternVIT-6B als een krachtige beeldencoder. We hebben hoogwaardige data uit het chemiedomein verzameld, waaronder moleculen, reactieformules en chemie-examengegevens, en deze samengesteld in een tweetalige multimodale vraag-antwoorddataset. We testen de prestaties van ons model op meerdere open-source benchmarks en drie aangepaste evaluatiesets. Experimentele resultaten tonen aan dat ons model uitstekende prestaties levert en state-of-the-art resultaten behaalt in vijf van de zes betrokken taken. Ons model is te vinden op https://huggingface.co/AI4Chem/ChemVLM-26B.

English

In this technical report, we propose ChemVLM, the first open-source multimodal large language model dedicated to the fields of chemistry, designed to address the incompatibility between chemical image understanding and text analysis. Built upon the VIT-MLP-LLM architecture, we leverage ChemLLM-20B as the foundational large model, endowing our model with robust capabilities in understanding and utilizing chemical text knowledge. Additionally, we employ InternVIT-6B as a powerful image encoder. We have curated high-quality data from the chemical domain, including molecules, reaction formulas, and chemistry examination data, and compiled these into a bilingual multimodal question-answering dataset. We test the performance of our model on multiple open-source benchmarks and three custom evaluation sets. Experimental results demonstrate that our model achieves excellent performance, securing state-of-the-art results in five out of six involved tasks. Our model can be found at https://huggingface.co/AI4Chem/ChemVLM-26B.

Zien en Begrijpen: Visie Verbinden met Chemische Kennis via ChemVLM

Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM

Samenvatting

Support