人工智能可习得科学品味

摘要

杰出科学家具备卓越的判断力与前瞻性，这与我们所说的科学品味密切相关。在此，我们将科学品味定义为对具有高潜在影响力的研究思路进行判断和提出的能力。然而，现有研究多聚焦于提升AI科学家的执行能力，如何增强AI的科学品味仍属探索不足的领域。本研究提出基于群体反馈的强化学习框架（RLCF），利用大规模群体信号作为监督信号，将科学品味学习构建为偏好建模与对齐问题。在偏好建模方面，我们基于70万个同领域同时期的高被引与低被引论文对训练"科学评审官"模型，使其具备研究思路的评判能力。在偏好对齐阶段，以科学评审官作为奖励模型，我们训练"科学思考者"策略模型来提出具有高潜在影响力的研究思路。实验表明，科学评审官在性能上超越主流大语言模型（如GPT-5.2、Gemini 3 Pro），并能泛化至未来年份测试、未知领域及同行评审偏好。此外，科学思考者提出的研究思路比基线模型具有更高潜在影响力。我们的研究证明AI能够习得科学品味，这标志着向人类水平AI科学家迈进的关键一步。

English

Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potential impact. However, most relative research focuses on improving an AI scientist's executive capability, while enhancing an AI's scientific taste remains underexplored. In this work, we propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem. For preference modeling, we train Scientific Judge on 700K field- and time-matched pairs of high- vs. low-citation papers to judge ideas. For preference alignment, using Scientific Judge as a reward model, we train a policy model, Scientific Thinker, to propose research ideas with high potential impact. Experiments show Scientific Judge outperforms SOTA LLMs (e.g., GPT-5.2, Gemini 3 Pro) and generalizes to future-year test, unseen fields, and peer-review preference. Furthermore, Scientific Thinker proposes research ideas with higher potential impact than baselines. Our findings show that AI can learn scientific taste, marking a key step toward reaching human-level AI scientists.