通過多重獎勵蒸餾來定制自我合理化者
Tailoring Self-Rationalizers with Multi-Reward Distillation
November 6, 2023
作者: Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li, Aaron Chan, Jack Hessel, Yejin Choi, Xiang Ren
cs.AI
摘要
大型語言模型(LMs)能夠生成自由文本理由以幫助問答。然而,先前的研究1)表明,有用的自我理性化僅在大規模(例如,175B參數的GPT-3)時才會出現;2)主要關注下游性能,忽略了理由本身的語義,例如,它們是否忠實、真實且對人類有幫助?在這項研究中,我們使小型LMs(約比GPT-3小約200倍)能夠生成理由,不僅提高下游任務性能,而且更加合理、一致和多樣,通過自動和人工評估進行評估。我們的方法MaRio(多獎勵理性化)是一種多獎勵條件化的自我理性化算法,優化多個不同屬性,如合理性、多樣性和一致性。在五個困難的問答數據集StrategyQA、QuaRel、OpenBookQA、NumerSense和QASC上的結果表明,MaRio不僅提高了任務準確性,還改善了小型LMs在上述軸上的自我理性化質量,優於監督微調(SFT)基線。廣泛的人工評估確認MaRio理由優於SFT理由,以及在合理性和一致性方面的定性改進。
English
Large language models (LMs) are capable of generating free-text rationales to
aid question answering. However, prior work 1) suggests that useful
self-rationalization is emergent only at significant scales (e.g., 175B
parameter GPT-3); and 2) focuses largely on downstream performance, ignoring
the semantics of the rationales themselves, e.g., are they faithful, true, and
helpful for humans? In this work, we enable small-scale LMs (approx. 200x
smaller than GPT-3) to generate rationales that not only improve downstream
task performance, but are also more plausible, consistent, and diverse,
assessed both by automatic and human evaluation. Our method, MaRio
(Multi-rewArd RatIOnalization), is a multi-reward conditioned
self-rationalization algorithm that optimizes multiple distinct properties like
plausibility, diversity and consistency. Results on five difficult
question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC
show that not only does MaRio improve task accuracy, but it also improves the
self-rationalization quality of small LMs across the aforementioned axes better
than a supervised fine-tuning (SFT) baseline. Extensive human evaluations
confirm that MaRio rationales are preferred vs. SFT rationales, as well as
qualitative improvements in plausibility and consistency.