可行學習 - 論文詳情

摘要

我們介紹可行學習（Feasible Learning, FL），一種以樣本為中心的學習範式，通過解決一個界定每個訓練樣本損失的可行性問題來訓練模型。與普遍存在的經驗風險最小化（Empirical Risk Minimization, ERM）框架相比，後者優化平均性能，而FL要求在每個單獨的數據點上達到令人滿意的性能。由於符合指定性能閾值的任何模型都是有效的FL解決方案，優化算法的選擇及其動態在塑造結果解的性質方面起著至關重要的作用。具體而言，我們研究了一種原始-對偶方法，該方法在訓練過程中動態重新加權每個樣本的重要性。為應對在實踐中設定有意義閾值的挑戰，我們引入了一種FL的放寬版本，其中包含最小範數的松弛變量。我們的實證分析涵蓋了圖像分類、年齡回歸以及大型語言模型中的偏好優化，結果表明，通過FL訓練的模型可以從數據中學習，同時在尾部行為方面優於ERM，對平均性能僅有輕微影響。

English

We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample. In contrast to the ubiquitous Empirical Risk Minimization (ERM) framework, which optimizes for average performance, FL demands satisfactory performance on every individual data point. Since any model that meets the prescribed performance threshold is a valid FL solution, the choice of optimization algorithm and its dynamics play a crucial role in shaping the properties of the resulting solutions. In particular, we study a primal-dual approach which dynamically re-weights the importance of each sample during training. To address the challenge of setting a meaningful threshold in practice, we introduce a relaxation of FL that incorporates slack variables of minimal norm. Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.