AdamMeme: 유해성에 대한 다중모드 대규모 언어 모델의 추론 능력 적응적 탐색

초록

소셜 미디어 시대에 다중 양식 밈(multimodal memes)의 확산은 다중 양식 대형 언어 모델(multimodal Large Language Models, mLLMs)이 밈의 유해성을 효과적으로 이해할 것을 요구하고 있다. 기존의 유해 밈 이해 평가를 위한 벤치마크는 정적 데이터셋을 사용한 정확도 기반, 모델-불특정 평가에 의존하고 있다. 이러한 벤치마크는 온라인 밈이 동적으로 진화함에 따라 최신적이고 철저한 평가를 제공하는 데 한계가 있다. 이를 해결하기 위해, 우리는 AdamMeme라는 유연한 에이전트 기반 평가 프레임워크를 제안한다. 이 프레임워크는 다중 에이전트 협업을 통해 mLLMs의 밈 유해성 해독 능력을 적응적으로 탐구하며, 도전적인 샘플로 밈 데이터를 반복적으로 업데이트함으로써 mLLMs가 유해성을 해석하는 방식의 특정 한계를 드러낸다. 광범위한 실험을 통해 우리의 프레임워크가 다양한 대상 mLLMs의 성능 차이를 체계적으로 밝히고, 모델별 약점에 대한 심층적이고 세밀한 분석을 제공함을 보여준다. 우리의 코드는 https://github.com/Lbotirx/AdamMeme에서 확인할 수 있다.

English

The proliferation of multimodal memes in the social media era demands that multimodal Large Language Models (mLLMs) effectively understand meme harmfulness. Existing benchmarks for assessing mLLMs on harmful meme understanding rely on accuracy-based, model-agnostic evaluations using static datasets. These benchmarks are limited in their ability to provide up-to-date and thorough assessments, as online memes evolve dynamically. To address this, we propose AdamMeme, a flexible, agent-based evaluation framework that adaptively probes the reasoning capabilities of mLLMs in deciphering meme harmfulness. Through multi-agent collaboration, AdamMeme provides comprehensive evaluations by iteratively updating the meme data with challenging samples, thereby exposing specific limitations in how mLLMs interpret harmfulness. Extensive experiments show that our framework systematically reveals the varying performance of different target mLLMs, offering in-depth, fine-grained analyses of model-specific weaknesses. Our code is available at https://github.com/Lbotirx/AdamMeme.

AdamMeme: 유해성에 대한 다중모드 대규모 언어 모델의 추론 능력 적응적 탐색

AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness

초록

Support