PaperFlow：跨每日論文串流的剖析、推薦與適應

摘要

科學論文推薦通常被評估為在固定候選集上的靜態排序，然而實際的科學閱讀是一個每日進行的縱向過程，其中興趣會轉移，反饋會累積。我們提出 PaperFlow，一個將此過程組織為三個耦合階段的框架：個人檔案構建（Profiling），從異質冷啟動證據中構建並維護結構化且可檢視的學者輪廓；推薦（Recommending），在固定展示預算下通過多信號聚合對每個特定日期的論文流進行排序；以及適應（Adapting），從語義不同的反饋信號更新用戶狀態並建模跨日的興趣漂移。我們進一步定義了一個縱向的用戶-天基準，該基準在共享的時間信息邊界內固定了用戶、日期、候選池、可見輸入以及隱藏的模擬相關性標籤。該基準包含24個模擬研究用戶、50個每日論文流、1,200個用戶-天回合、20,727篇唯一論文，以及497,448個回合-論文記錄。我們還指定了一個盲法人工評估協議，以驗證自動指標與專家判斷之間的一致性。與五個科學推薦基線的實驗表明，PaperFlow 實現了最強的基於神諭的排序、與模擬閱讀選擇最高的行為一致性，以及最佳的盲法人工評估分數。

English

Scientific paper recommendation is typically evaluated as static ranking over a fixed candidate set, yet real scientific reading unfolds as a daily, longitudinal process in which interests shift and feedback accumulates. We introduce PaperFlow, a framework that organizes it into three coupled stages: Profiling, which constructs and maintains a structured, inspectable scholarly profile from heterogeneous cold-start evidence; Recommending, which ranks each date-specific paper stream through multi-signal aggregation under a fixed display budget; and Adapting, which updates user state from semantically distinct feedback signals and models interest drift across days. We further define a longitudinal user-day benchmark that fixes users, dates, candidate pools, visible inputs, and hidden simulated relevance labels under a shared temporal information boundary. The benchmark contains 24 simulated research users, 50 daily paper streams, 1,200 user-day episodes, 20,727 unique papers, and 497,448 episode-paper records. We additionally specify a blind human-evaluation protocol to validate alignment between automatic metrics and expert judgments. Experiments against five scientific recommendation baselines show that PaperFlow achieves the strongest oracle-based ranking, the highest behavioral alignment with simulated reading selections, and the best blind human-evaluation score.