ChatPaper.aiChatPaper

METIS:深度探究与解决方案智能指导引擎

METIS: Mentoring Engine for Thoughtful Inquiry & Solutions

January 19, 2026
作者: Abhinav Rajeev Kumar, Dhruv Trehan, Paras Chopra
cs.AI

摘要

许多本科生难以获得专业的研究指导。我们探究人工智能导师能否协助学生从构想到完成论文。为此我们开发了METIS——一个具备工具增强功能、阶段感知的辅助系统,集成文献检索、精选指南、方法论检查与记忆模块。通过LLM作为评判者的两两偏好比较、学生角色量规、短对话辅导及证据/合规性检查,我们在六个写作阶段将METIS与GPT-5和Claude Sonnet 4.5进行对比评估。在90个单轮提示测试中,LLM评判者偏好METIS的比例相较于Claude Sonnet 4.5达71%,相较于GPT-5达54%。分阶段评估显示(清晰度/可操作性/约束匹配度;90提示×3评委),METIS的学生评分全面领先。在多轮对话场景(五种情境/智能体)中,METIS的最终成果质量略高于GPT-5。优势集中体现在文档依托阶段(D-F),这与阶段感知路由和资料锚定机制相符;现存不足包括工具过早路由、资料锚定深度不足及偶发性阶段误判。
English
Many students lack access to expert research mentorship. We ask whether an AI mentor can move undergraduates from an idea to a paper. We build METIS, a tool-augmented, stage-aware assistant with literature search, curated guidelines, methodology checks, and memory. We evaluate METIS against GPT-5 and Claude Sonnet 4.5 across six writing stages using LLM-as-a-judge pairwise preferences, student-persona rubrics, short multi-turn tutoring, and evidence/compliance checks. On 90 single-turn prompts, LLM judges preferred METIS to Claude Sonnet 4.5 in 71% and to GPT-5 in 54%. Student scores (clarity/actionability/constraint-fit; 90 prompts x 3 judges) are higher across stages. In multi-turn sessions (five scenarios/agent), METIS yields slightly higher final quality than GPT-5. Gains concentrate in document-grounded stages (D-F), consistent with stage-aware routing and groundings failure modes include premature tool routing, shallow grounding, and occasional stage misclassification.
PDF11January 22, 2026