圏論を用いた文書の理解、測定、操作

要旨

我々は、カテゴリー理論を応用してマルチモーダル文書構造を抽出し、情報理論的測度の開発、内容要約と拡張、大規模事前学習モデルの自己教師あり改善を実現する。まず、文書を質問-回答ペアの圏として数学的に表現する。次に、直交化手順を開発し、1つまたは複数の文書に含まれる情報を重複のない部分に分割する。第1段階と第2段階で抽出された構造に基づき、文書に含まれる情報を測定・列挙する手法を開発する。さらにこれらの段階を発展させ、新しい要約技術や、元の文書を拡張する解釈（exegesis）という新たな問題への解決策を提案する。我々の質問-回答ペア手法は、要約技術に対する新規なレート歪み解析を可能にする。大規模事前学習モデルを用いて手法を実装し、数学的枠組みのマルチモーダル拡張を提案する。最後に、RLVRを用いた新規な自己教師あり手法を開発し、圏論的枠組みから自然に導かれる構成可能性や特定の演算における閉性といった一貫性制約を活用して、大規模事前学習モデルを改善する。

English

We apply category theory to extract multimodal document structure which leads us to develop information theoretic measures, content summarization and extension, and self-supervised improvement of large pretrained models. We first develop a mathematical representation of a document as a category of question-answer pairs. Second, we develop an orthogonalization procedure to divide the information contained in one or more documents into non-overlapping pieces. The structures extracted in the first and second steps lead us to develop methods to measure and enumerate the information contained in a document. We also build on those steps to develop new summarization techniques, as well as to develop a solution to a new problem viz. exegesis resulting in an extension of the original document. Our question-answer pair methodology enables a novel rate distortion analysis of summarization techniques. We implement our techniques using large pretrained models, and we propose a multimodal extension of our overall mathematical framework. Finally, we develop a novel self-supervised method using RLVR to improve large pretrained models using consistency constraints such as composability and closure under certain operations that stem naturally from our category theoretic framework.