ChatPaper.aiChatPaper

ScheMatiQ:基于交互式模式发现实现从研究问题到结构化数据的转化

ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery

April 10, 2026
作者: Shahar Levy, Eliya Habba, Reshef Mintz, Barak Raveh, Renana Keydar, Gabriel Stanovsky
cs.AI

摘要

当前,许多学科领域都需要对大规模文档集进行自然语言研究提问,其答案通常需要结构化证据支撑。传统方法依赖人工设计标注框架并对语料库进行穷尽式标注,这一过程既缓慢又易出错。我们推出的ScheMatiQ系统,通过调用核心大语言模型,能够根据问题与语料库自动生成结构化框架及基于证据的数据库,并配备可引导和修正提取过程的网络交互界面。通过与领域专家合作,我们证明ScheMatiQ在法学与计算生物学领域的实际分析中能有效产出支持性成果。现将ScheMatiQ作为开源项目发布,提供公共网络接口,诚邀各学科专家使用自有数据进行探索。所有资源(包括网站、源代码及演示视频)均可在以下网址获取:www.ScheMatiQ-ai.com
English
Many disciplines pose natural-language research questions over large document collections whose answers typically require structured evidence, traditionally obtained by manually designing an annotation schema and exhaustively labeling the corpus, a slow and error-prone process. We introduce ScheMatiQ, which leverages calls to a backbone LLM to take a question and a corpus to produce a schema and a grounded database, with a web interface that lets steer and revise the extraction. In collaboration with domain experts, we show that ScheMatiQ yields outputs that support real-world analysis in law and computational biology. We release ScheMatiQ as open source with a public web interface, and invite experts across disciplines to use it with their own data. All resources, including the website, source code, and demonstration video, are available at: www.ScheMatiQ-ai.com
PDF53April 14, 2026