PaperFlow：跨每日论文流的画像、推荐与自适应

摘要

科学论文推荐通常被评估为对固定候选集的静态排序，然而真实的科研阅读过程是每日进行的纵向演变，其中兴趣会发生变化，反馈不断积累。我们提出PaperFlow框架，将其组织为三个耦合阶段：用户画像构建（Profiling）——从异构冷启动证据中构建并维护结构化的、可检查的学术画像；推荐（Recommending）——在固定展示预算下，通过多信号聚合对每个特定日期的论文流进行排序；以及适应（Adapting）——从语义不同的反馈信号中更新用户状态，并建模跨天的兴趣漂移。我们进一步定义了一个纵向的用户-天基准，该基准在共享的时间信息边界内固定了用户、日期、候选池、可见输入以及隐藏的模拟相关性标签。该基准包含24个模拟科研用户、50个每日论文流、1,200个用户-天片段、20,727篇独立论文以及497,448条片段-论文记录。此外，我们指定了一种盲人评估协议，以验证自动指标与专家判断之间的一致性。与五个科学推荐基线的实验表明，PaperFlow在基于神谕的排序上表现最强，与模拟阅读选择的行为对齐度最高，并且获得了最佳的盲人评估分数。

English

Scientific paper recommendation is typically evaluated as static ranking over a fixed candidate set, yet real scientific reading unfolds as a daily, longitudinal process in which interests shift and feedback accumulates. We introduce PaperFlow, a framework that organizes it into three coupled stages: Profiling, which constructs and maintains a structured, inspectable scholarly profile from heterogeneous cold-start evidence; Recommending, which ranks each date-specific paper stream through multi-signal aggregation under a fixed display budget; and Adapting, which updates user state from semantically distinct feedback signals and models interest drift across days. We further define a longitudinal user-day benchmark that fixes users, dates, candidate pools, visible inputs, and hidden simulated relevance labels under a shared temporal information boundary. The benchmark contains 24 simulated research users, 50 daily paper streams, 1,200 user-day episodes, 20,727 unique papers, and 497,448 episode-paper records. We additionally specify a blind human-evaluation protocol to validate alignment between automatic metrics and expert judgments. Experiments against five scientific recommendation baselines show that PaperFlow achieves the strongest oracle-based ranking, the highest behavioral alignment with simulated reading selections, and the best blind human-evaluation score.