マルチ報酬蒸留による自己合理化モデルの最適化

要旨

大規模言語モデル（LM）は、質問応答を支援するための自由形式の根拠を生成することが可能です。しかし、従来の研究では、1）有用な自己説明能力は大規模なモデル（例えば、1750億パラメータのGPT-3）でのみ発現する可能性が示唆されており、2）下流タスクのパフォーマンスに焦点が当てられており、根拠自体の意味論（例えば、それらが忠実で真実であり、人間にとって役立つかどうか）は無視されてきました。本研究では、GPT-3の約200分の1の規模の小規模LMが、下流タスクのパフォーマンスを向上させるだけでなく、自動評価と人間評価の両方において、より妥当性が高く、一貫性があり、多様な根拠を生成できるようにしました。私たちの手法であるMaRio（Multi-rewArd RatIOnalization）は、妥当性、多様性、一貫性といった複数の異なる特性を最適化する多報酬条件付き自己説明アルゴリズムです。StrategyQA、QuaRel、OpenBookQA、NumerSense、QASCという5つの難しい質問応答データセットでの結果は、MaRioがタスクの精度を向上させるだけでなく、教師ありファインチューニング（SFT）ベースラインよりも、前述の軸において小規模LMの自己説明品質を向上させることを示しています。大規模な人間評価により、MaRioの根拠がSFTの根拠よりも好まれること、および妥当性と一貫性の質的改善が確認されました。

English

Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In this work, we enable small-scale LMs (approx. 200x smaller than GPT-3) to generate rationales that not only improve downstream task performance, but are also more plausible, consistent, and diverse, assessed both by automatic and human evaluation. Our method, MaRio (Multi-rewArd RatIOnalization), is a multi-reward conditioned self-rationalization algorithm that optimizes multiple distinct properties like plausibility, diversity and consistency. Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline. Extensive human evaluations confirm that MaRio rationales are preferred vs. SFT rationales, as well as qualitative improvements in plausibility and consistency.

マルチ報酬蒸留による自己合理化モデルの最適化

Tailoring Self-Rationalizers with Multi-Reward Distillation

要旨

Support