Análisis de Sentimientos de Reseñas en Línea en Lituano Utilizando Modelos de Lenguaje Grandes

Resumen

El análisis de sentimientos es un área ampliamente investigada dentro del Procesamiento del Lenguaje Natural (PLN), que atrae un interés significativo debido a la llegada de soluciones automatizadas. A pesar de esto, la tarea sigue siendo desafiante debido a la complejidad inherente de los idiomas y la naturaleza subjetiva de los sentimientos. Es aún más desafiante para idiomas menos estudiados y con menos recursos, como el lituano. Nuestra revisión de la investigación existente en PLN lituano revela que los métodos tradicionales de aprendizaje automático y los algoritmos de clasificación tienen una efectividad limitada para la tarea. En este trabajo, abordamos el análisis de sentimientos de reseñas en línea lituanas basadas en cinco estrellas de múltiples dominios que recopilamos y limpiamos. Aplicamos modelos transformadores a esta tarea por primera vez, explorando las capacidades de los Modelos de Lenguaje Grande (LLM) pre-entrenados multilingües, centrándonos específicamente en el ajuste fino de los modelos BERT y T5. Dada la dificultad inherente de la tarea, los modelos ajustados finamente tienen un rendimiento bastante bueno, especialmente cuando los sentimientos mismos son menos ambiguos: una precisión de reconocimiento del 80.74% y 89.61% en pruebas para las reseñas más populares de una y cinco estrellas respectivamente. Superan significativamente al estado del arte comercial actual de los LLM de propósito general GPT-4. Compartimos abiertamente nuestros LLM ajustados finamente en línea.

English

Sentiment analysis is a widely researched area within Natural Language Processing (NLP), attracting significant interest due to the advent of automated solutions. Despite this, the task remains challenging because of the inherent complexity of languages and the subjective nature of sentiments. It is even more challenging for less-studied and less-resourced languages such as Lithuanian. Our review of existing Lithuanian NLP research reveals that traditional machine learning methods and classification algorithms have limited effectiveness for the task. In this work, we address sentiment analysis of Lithuanian five-star-based online reviews from multiple domains that we collect and clean. We apply transformer models to this task for the first time, exploring the capabilities of pre-trained multilingual Large Language Models (LLMs), specifically focusing on fine-tuning BERT and T5 models. Given the inherent difficulty of the task, the fine-tuned models perform quite well, especially when the sentiments themselves are less ambiguous: 80.74% and 89.61% testing recognition accuracy of the most popular one- and five-star reviews respectively. They significantly outperform current commercial state-of-the-art general-purpose LLM GPT-4. We openly share our fine-tuned LLMs online.

Análisis de Sentimientos de Reseñas en Línea en Lituano Utilizando Modelos de Lenguaje Grandes

Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models

Resumen

Support