이탈리아의 10년간의 계산 언어학을 조명하다: CLiC-it 코퍼스

초록

지난 10년 동안, 컴퓨팅 언어학(Computational Linguistics, CL)과 자연어 처리(Natural Language Processing, NLP)는 특히 트랜스포머 기반 대형 언어 모델(Large Language Models, LLMs)의 등장으로 빠르게 발전해 왔습니다. 이러한 변화는 연구 목표와 우선순위를 어휘 및 의미 자원에서 언어 모델링과 다중모달리티로 전환시켰습니다. 본 연구에서는 이탈리아의 CL 및 NLP 커뮤니티의 연구 동향을 CLiC-it 컨퍼런스에 기여된 논문들을 분석하여 추적합니다. CLiC-it은 이 분야에서 선도적인 이탈리아 컨퍼런스로 간주됩니다. 우리는 CLiC-it 컨퍼런스의 첫 10회(2014년부터 2024년까지)의 논문들을 CLiC-it 코퍼스로 편찬하여, 저자의 출신, 성별, 소속 등 메타데이터와 다양한 주제를 다루는 논문 내용을 포괄적으로 분석합니다. 우리의 목표는 이탈리아 및 국제 연구 커뮤니티에 시간에 따른 새로운 트렌드와 주요 발전에 대한 유용한 통찰력을 제공하여, 이 분야에서 정보에 기반한 결정과 미래 방향을 지원하는 것입니다.

English

Over the past decade, Computational Linguistics (CL) and Natural Language Processing (NLP) have evolved rapidly, especially with the advent of Transformer-based Large Language Models (LLMs). This shift has transformed research goals and priorities, from Lexical and Semantic Resources to Language Modelling and Multimodality. In this study, we track the research trends of the Italian CL and NLP community through an analysis of the contributions to CLiC-it, arguably the leading Italian conference in the field. We compile the proceedings from the first 10 editions of the CLiC-it conference (from 2014 to 2024) into the CLiC-it Corpus, providing a comprehensive analysis of both its metadata, including author provenance, gender, affiliations, and more, as well as the content of the papers themselves, which address various topics. Our goal is to provide the Italian and international research communities with valuable insights into emerging trends and key developments over time, supporting informed decisions and future directions in the field.