低チャンネルEEGエージェントのための境界認識型コンテキストグラウンディング

要旨

大規模言語モデル（LLM）は、科学ソフトウェアの利用を容易にすることができる。しかし、汎用モデルは、特定のセンサがどのような測定をサポートしているか、現在のソフトウェアにどのアルゴリズムが実装されているか、あるいは計算結果からどのような結論が正当化されるかを自動的に認識するわけではない。これらの区別は、特に低チャンネル脳波（EEG）において重要である。なぜなら、空間的カバレッジが疎で信号品質が変動するため、もっともらしいが根拠のない解釈が容易に生じるからである。我々は、決定論的ローカルEEGエンジンとハードウェア認識言語層を分離したオープンソースアーキテクチャであるNeuraDock Agentを提案する。数値エンジンは、記録を解析し、品質管理を実行し、レビュー済みスペクトルワークフローを処理し、機械可読なアーティファクトを書き出す。LLMは、許可リスト化されたコンパクトな要約とバージョン管理されたコンテキストパックのみを受け取る。コンテキストは、7チャンネルハードウェア、レビュー済みワークフロー、結果フィールド、実装上の境界、科学的限界、およびリファレンス事例を記述する。生のEEGデータと密なサンプル単位の配列はローカルに保持される。本システムを3つのレベルで評価する。第一に、12の記録において10回の数値繰り返しで同一の構造化結果が得られ、完全なRest/Task実行において3回の繰り返しで同一の結果、レポート、および図のハッシュが得られた。第二に、リクエストキャプチャおよび故障注入実験により、テストされたデータ境界と、HTTP、不正形式出力、および接続障害下でのローカルアーティファクトの保存が確認された。第三に、境界認識ベンチマークで、4つのコンテキストアブレーションと2つのLLMの下で36の通常問題および敵対的質問をテストし、288の出力を得た。これらの結果は、EEGエージェントが受理、条件付き受理、または拒否する対象を較正するための実用的なメカニズムとして、ハードウェア認識および実装認識によるグラウンディングを支持するものである。ただし、これらは臨床的妥当性や検証済みの絶対認知負荷指標を確立するものではない。

English

Large language models (LLMs) can make scientific software easier to use. However, a general model does not automatically know which measurements a particular sensor can support, which algorithms are implemented in the current software, or which conclusions are justified by a computed result. These distinctions are especially important for low-channel electroencephalography (EEG), where sparse spatial coverage and variable signal quality make plausible but unsupported interpretations easy to produce. We present NeuraDock Agent, an open-source architecture that separates a deterministic local EEG engine from a hardware-aware language layer. The numerical engine parses recordings, performs quality control, executes reviewed spectral workflows, and writes machine-readable artifacts. The LLM receives only a compact, allowlisted summary and a versioned context pack. The context describes the seven-channel hardware, reviewed workflows, result fields, implementation boundaries, scientific limits, and reference cases. Raw EEG and dense per-sample arrays remain local We evaluate the system at three levels. First, 12 recordings produced identical structured results over ten numerical repetitions, and a complete Rest/Task run produced identical result, report, and figure hashes over three repetitions. Second, request-capture and failure-injection experiments confirmed the tested data boundary and preservation of local artifacts under HTTP, malformed-output, and connection failures. Third, a boundary-awareness benchmark tested 36 ordinary and adversarial questions under four context ablations and two LLMs, yielding 288 outputs.These results support hardware- and implementation-aware grounding as a practical mechanism for calibrating what an EEG agent accepts, qualifies, or refuses; they do not establish clinical validity or a validated absolute cognitive-load index.