実現可能な学習

要旨

Feasible Learning（FL）を紹介します。これは、モデルが各トレーニングサンプルの損失を制限する実行可能性問題を解決することでトレーニングされるサンプル中心の学習パラダイムです。汎用的な経験リスク最小化（ERM）フレームワークとは対照的に、ERMが平均パフォーマンスを最適化するのに対し、FLは個々のデータポイントでの満足できるパフォーマンスを要求します。所定のパフォーマンス閾値を満たす任意のモデルが妥当なFLソリューションであるため、最適化アルゴリズムの選択とそのダイナミクスが、生成されるソリューションの特性を形作る上で重要な役割を果たします。特に、トレーニング中に各サンプルの重要性を動的に再重み付けする原始-双対アプローチを研究しています。実践で意味のある閾値を設定する課題に対処するために、最小ノルムのスラック変数を組み込んだFLの緩和を紹介します。画像分類、年齢回帰、および大規模言語モデルにおける選好最適化を含む経験的分析により、FLを介してトレーニングされたモデルは、ERMに比べて改善されたテール動作を示しながら、平均パフォーマンスにほとんど影響を与えないことが示されました。

English

We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample. In contrast to the ubiquitous Empirical Risk Minimization (ERM) framework, which optimizes for average performance, FL demands satisfactory performance on every individual data point. Since any model that meets the prescribed performance threshold is a valid FL solution, the choice of optimization algorithm and its dynamics play a crucial role in shaping the properties of the resulting solutions. In particular, we study a primal-dual approach which dynamically re-weights the importance of each sample during training. To address the challenge of setting a meaningful threshold in practice, we introduce a relaxation of FL that incorporates slack variables of minimal norm. Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.