ChatPaper.aiChatPaper

CLS-RL:基于规则的强化学习图像分类

CLS-RL: Image Classification with Rule-Based Reinforcement Learning

March 20, 2025
作者: Ming Li, Shitian Zhao, Jike Zhong, Yuxiang Lai, Kaipeng Zhang
cs.AI

摘要

分类是机器学习的核心任务。近期研究表明,尽管多模态大语言模型(MLLMs)在图像分类任务上初始表现不佳,但通过适量数据的微调能显著提升其性能,使其与当前最先进的分类模型相媲美。然而,获取大规模标注数据成本高昂。本文探讨了少样本MLLM分类微调。我们发现,监督式微调(SFT)会导致严重的过拟合问题,甚至可能使性能低于零样本方法。针对这一挑战,受基于规则的强化学习近期成功的启发,我们提出了CLS-RL,它利用可验证信号作为奖励来微调MLLMs。我们发现,在大多数数据集上,CLS-RL优于SFT,并在基础到新任务及少样本学习设置下展现出更高的平均准确率。此外,我们观察到CLS-RL存在“免费午餐”现象:当模型在特定数据集上微调后,其在其他不同分布和类别名称的数据集上的性能也可能超越零样本模型,这表明基于强化学习的方法有效教授了模型分类的基本原理。最后,受推理时思考最新研究的启发,我们重新审视了视觉分类背景下微调过程中的“思考过程”,这是基于强化学习方法的关键环节。我们质疑此类任务在微调期间是否需要广泛的思考过程,提出这可能反而损害性能。基于此前提,我们引入了No-Thinking-CLS-RL方法,通过设定等准确率奖励,在训练中最小化思考过程。我们的研究结果表明,No-Thinking-CLS-RL方法以更少的微调时间,实现了优于CLS-RL的域内性能和泛化能力。
English
Classification is a core task in machine learning. Recent research has shown that although Multimodal Large Language Models (MLLMs) are initially poor at image classification, fine-tuning them with an adequate amount of data can significantly enhance their performance, making them comparable to SOTA classification models. However, acquiring large-scale labeled data is expensive. In this paper, we explore few-shot MLLM classification fine-tuning. We found that SFT can cause severe overfitting issues and may even degrade performance over the zero-shot approach. To address this challenge, inspired by the recent successes in rule-based reinforcement learning, we propose CLS-RL, which uses verifiable signals as reward to fine-tune MLLMs. We discovered that CLS-RL outperforms SFT in most datasets and has a much higher average accuracy on both base-to-new and few-shot learning setting. Moreover, we observed a free-lunch phenomenon for CLS-RL; when models are fine-tuned on a particular dataset, their performance on other distinct datasets may also improve over zero-shot models, even if those datasets differ in distribution and class names. This suggests that RL-based methods effectively teach models the fundamentals of classification. Lastly, inspired by recent works in inference time thinking, we re-examine the `thinking process' during fine-tuning, a critical aspect of RL-based methods, in the context of visual classification. We question whether such tasks require extensive thinking process during fine-tuning, proposing that this may actually detract from performance. Based on this premise, we introduce the No-Thinking-CLS-RL method, which minimizes thinking processes during training by setting an equality accuracy reward. Our findings indicate that, with much less fine-tuning time, No-Thinking-CLS-RL method achieves superior in-domain performance and generalization capabilities than CLS-RL.

Summary

AI-Generated Summary

PDF92March 21, 2025