ChatPaper.aiChatPaper

LLaNA:大型语言和NeRF助手

LLaNA: Large Language and NeRF Assistant

June 17, 2024
作者: Andrea Amaduzzi, Pierluigi Zama Ramirez, Giuseppe Lisanti, Samuele Salti, Luigi Di Stefano
cs.AI

摘要

多模态大型语言模型(MLLMs)已经展示出对图像和3D数据的出色理解能力。然而,这两种模态在全面捕捉物体外观和几何特征方面存在不足。与此同时,神经辐射场(NeRFs)通过在简单的多层感知器(MLP)的权重中编码信息,已经成为一种日益普及的模态,可以同时编码物体的几何结构和照片般逼真的外观。本文研究了将NeRF融入MLLM的可行性和有效性。我们创建了LLaNA,这是第一个通用的NeRF-语言助手,能够执行NeRF字幕和问答等新任务。值得注意的是,我们的方法直接处理NeRF的MLP权重,提取有关所代表物体的信息,无需渲染图像或实现3D数据结构。此外,我们构建了一个包含文本注释的NeRF数据集,用于各种NeRF-语言任务,无需人工干预。基于这个数据集,我们开发了一个基准来评估我们方法的NeRF理解能力。结果显示,处理NeRF权重在性能上优于从NeRF中提取2D或3D表示。
English
Multimodal Large Language Models (MLLMs) have demonstrated an excellent understanding of images and 3D data. However, both modalities have shortcomings in holistically capturing the appearance and geometry of objects. Meanwhile, Neural Radiance Fields (NeRFs), which encode information within the weights of a simple Multi-Layer Perceptron (MLP), have emerged as an increasingly widespread modality that simultaneously encodes the geometry and photorealistic appearance of objects. This paper investigates the feasibility and effectiveness of ingesting NeRF into MLLM. We create LLaNA, the first general-purpose NeRF-language assistant capable of performing new tasks such as NeRF captioning and Q\&A. Notably, our method directly processes the weights of the NeRF's MLP to extract information about the represented objects without the need to render images or materialize 3D data structures. Moreover, we build a dataset of NeRFs with text annotations for various NeRF-language tasks with no human intervention. Based on this dataset, we develop a benchmark to evaluate the NeRF understanding capability of our method. Results show that processing NeRF weights performs favourably against extracting 2D or 3D representations from NeRFs.

Summary

AI-Generated Summary

PDF183December 6, 2024