AI能習得科學品味

摘要

傑出科學家具備卓越的判斷力與前瞻視野，這與我們所稱的「科學品味」密切相關。此處我們將該術語定義為：對具有高潛在影響力的研究思路進行判斷與提出建議的能力。然而現有研究多聚焦於提升AI科學家的執行能力，如何增強AI的科學品味仍屬探索不足的領域。本研究提出「社群反饋強化學習」（RLCF）訓練範式，以大規模社群信號作為監督訊號，將科學品味學習構建為偏好建模與對齊問題。在偏好建模方面，我們基於70萬組領域與時序匹配的高被引/低被引論文對訓練「科學評判者」，使其具備研究思路的評估能力。在偏好對齊階段，以科學評判者作為獎勵模型，訓練策略模型「科學思考者」提出具高潛在影響力的研究思路。實驗表明：科學評判者優於SOTA大語言模型（如GPT-5.2、Gemini 3 Pro），並能泛化至未來年份測試、未見領域及同行評審偏好；而科學思考者所提研究思路的潛在影響力亦超越基準模型。我們的發現證明AI能夠習得科學品味，這標誌著向人類級別AI科學家邁進的關鍵一步。

English

Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most relative research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored. In this work, we propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers to judge ideas. For preference alignment, using Scientific Judge as a reward model, we train a policy model, Scientific Thinker, to propose research ideas with high potential impact. Experiments show Scientific Judge outperforms SOTA LLMs (e.g., GPT-5.2, Gemini 3 Pro) and generalizes to future-year test, unseen fields, and peer-review preference. Furthermore, Scientific Thinker proposes research ideas with higher potential impact than baselines. Our findings show that AI can learn scientific taste, marking a key step toward reaching human-level AI scientists.

AI能習得科學品味

AI Can Learn Scientific Taste

摘要

Support