ScheMatiQ: 연구 질문에서 상호작용적 스키마 탐색을 통한 구조화된 데이터로

초록

많은 학문 분야에서는 대규모 문서 집합에 대해 자연어 형태의 연구 질문을 제기하며, 이에 대한 답변은 일반적으로 구조화된 증거를 필요로 합니다. 기존에는 이러한 증거를 얻기 위해 수동으로 주석 체계를 설계하고 코퍼스 전체에 걸쳐 완전하게 레이블을 지정하는 방식이 사용되었으나, 이는 속도가 느리고 오류가 발생하기 쉬운 과정입니다. 우리는 이러한 문제를 해결하기 위해 ScheMatiQ를 소개합니다. ScheMatiQ는 핵심 대형 언어 모델(LLM)을 활용하여 질문과 코퍼스를 입력받아 스키마와 근거가 충실한 데이터베이스를 생성하며, 웹 인터페이스를 통해 추출 과정을 조정하고 수정할 수 있게 합니다. 해당 분야 전문가들과의 협력을 통해 우리는 ScheMatiQ가 법학 및 계산 생물학 분야의 실제 분석을 지원하는 결과물을 도출함을 보여줍니다. 우리는 ScheMatiQ를 오픈 소스로 공개하고 공개 웹 인터페이스를 제공하며, 다양한 분야의 전문가들이 자신의 데이터를 활용해 도구를 사용하도록 초대합니다. 웹사이트, 소스 코드, 시연 동영상을 포함한 모든 자료는 www.ScheMatiQ-ai.com에서 확인할 수 있습니다.

English

Many disciplines pose natural-language research questions over large document collections whose answers typically require structured evidence, traditionally obtained by manually designing an annotation schema and exhaustively labeling the corpus, a slow and error-prone process. We introduce ScheMatiQ, which leverages calls to a backbone LLM to take a question and a corpus to produce a schema and a grounded database, with a web interface that lets steer and revise the extraction. In collaboration with domain experts, we show that ScheMatiQ yields outputs that support real-world analysis in law and computational biology. We release ScheMatiQ as open source with a public web interface, and invite experts across disciplines to use it with their own data. All resources, including the website, source code, and demonstration video, are available at: www.ScheMatiQ-ai.com

ScheMatiQ: 연구 질문에서 상호작용적 스키마 탐색을 통한 구조화된 데이터로

ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery

초록

Support