SemiEvol：LLM适应的半监督微调

摘要

监督微调（SFT）对于调整大型语言模型（LLMs）以适应特定领域或任务至关重要。然而，在实际应用中只有有限量的标记数据可用，这给SFT在产生令人满意的结果方面带来了严峻挑战。因此，一种能够充分利用标记和未标记数据进行LLM微调的数据高效框架备受期待。为此，我们引入了一种名为SemiEvol的半监督微调框架，用于从传播和选择方式进行LLM适应。对于知识传播，SemiEvol采用双层方法，通过权重内传播和上下文内传播方法将知识从标记数据传播到未标记数据。对于知识选择，SemiEvol融入协作学习机制，选择更高质量的伪响应样本。我们在七个通用或领域特定数据集上使用GPT-4o-mini和Llama-3.1进行了实验，展示了模型在目标数据上性能显著提升。此外，我们将SemiEvol与SFT和自我演化方法进行了比较，突出了其在混合数据场景中的实用性。

English

Supervised fine-tuning (SFT) is crucial in adapting large language models (LLMs) to a specific domain or task. However, only a limited amount of labeled data is available in practical applications, which poses a severe challenge for SFT in yielding satisfactory results. Therefore, a data-efficient framework that can fully exploit labeled and unlabeled data for LLM fine-tuning is highly anticipated. Towards this end, we introduce a semi-supervised fine-tuning framework named SemiEvol for LLM adaptation from a propagate-and-select manner. For knowledge propagation, SemiEvol adopts a bi-level approach, propagating knowledge from labeled data to unlabeled data through both in-weight and in-context methods. For knowledge selection, SemiEvol incorporates a collaborative learning mechanism, selecting higher-quality pseudo-response samples. We conducted experiments using GPT-4o-mini and Llama-3.1 on seven general or domain-specific datasets, demonstrating significant improvements in model performance on target data. Furthermore, we compared SemiEvol with SFT and self-evolution methods, highlighting its practicality in hybrid data scenarios.

SemiEvol：LLM适应的半监督微调

SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

摘要

Support