화학식의 다중모드 검증을 위한 LLM 컨텍스트 조건화 및 PWP 프롬프팅

초록

복잡한 과학 및 기술 문서 내에서 미묘한 기술적 오류를 식별하는 것은, 특히 다중 모드 해석(예: 이미지 내 수식)이 필요한 경우, 대규모 언어 모델(LLMs)에게 상당한 장벽으로 작용하며, 이러한 모델의 내재적 오류 수정 경향이 부정확성을 가릴 수 있다. 이 탐색적 개념 증명(PoC) 연구는 지속적 워크플로 프롬프팅(PWP) 원칙에 기반한 구조화된 LLM 컨텍스트 조건화를 추론 시 이러한 LLM 행동을 조절하는 방법론적 전략으로 조사한다. 이 접근법은 API 접근이나 모델 수정 없이 표준 채팅 인터페이스만을 활용하여 일반 목적 LLM(특히 Gemini 2.5 Pro와 ChatGPT Plus o3)의 정밀 검증 작업에 대한 신뢰성을 향상시키도록 설계되었다. 이 방법론을 탐구하기 위해, 우리는 알려진 텍스트 및 이미지 기반 오류가 포함된 단일 복잡한 테스트 논문 내 화학식 검증에 초점을 맞췄다. 여러 프롬프팅 전략을 평가한 결과, 기본 프롬프트는 신뢰할 수 없는 반면, PWP 구조를 적용하여 LLM의 분석적 사고를 엄격히 조건화하는 접근법이 두 모델 모두에서 텍스트 오류 식별을 개선하는 것으로 나타났다. 특히, 이 방법은 Gemini 2.5 Pro가 수동 검토 중 이전에 간과된 미묘한 이미지 기반 수식 오류를 반복적으로 식별하도록 이끌었으며, ChatGPT Plus o3는 우리의 테스트에서 이 작업에 실패했다. 이러한 예비 결과는 세부 지향적 검증을 방해하는 특정 LLM 운영 모드를 강조하고, PWP 기반 컨텍스트 조건화가 과학 및 기술 문서 내에서 꼼꼼한 오류 탐지가 필요한 작업을 위한 보다 견고한 LLM 기반 분석 워크플로 개발에 유망하고 매우 접근 가능한 기술을 제공할 수 있음을 시사한다. 이 제한된 PoC를 넘어 광범위한 검증이 더 넓은 적용 가능성을 확인하기 위해 필요하다.

English

Identifying subtle technical errors within complex scientific and technical documents, especially those requiring multimodal interpretation (e.g., formulas in images), presents a significant hurdle for Large Language Models (LLMs) whose inherent error-correction tendencies can mask inaccuracies. This exploratory proof-of-concept (PoC) study investigates structured LLM context conditioning, informed by Persistent Workflow Prompting (PWP) principles, as a methodological strategy to modulate this LLM behavior at inference time. The approach is designed to enhance the reliability of readily available, general-purpose LLMs (specifically Gemini 2.5 Pro and ChatGPT Plus o3) for precise validation tasks, crucially relying only on their standard chat interfaces without API access or model modifications. To explore this methodology, we focused on validating chemical formulas within a single, complex test paper with known textual and image-based errors. Several prompting strategies were evaluated: while basic prompts proved unreliable, an approach adapting PWP structures to rigorously condition the LLM's analytical mindset appeared to improve textual error identification with both models. Notably, this method also guided Gemini 2.5 Pro to repeatedly identify a subtle image-based formula error previously overlooked during manual review, a task where ChatGPT Plus o3 failed in our tests. These preliminary findings highlight specific LLM operational modes that impede detail-oriented validation and suggest that PWP-informed context conditioning offers a promising and highly accessible technique for developing more robust LLM-driven analytical workflows, particularly for tasks requiring meticulous error detection in scientific and technical documents. Extensive validation beyond this limited PoC is necessary to ascertain broader applicability.

화학식의 다중모드 검증을 위한 LLM 컨텍스트 조건화 및 PWP 프롬프팅

LLM Context Conditioning and PWP Prompting for Multimodal Validation of Chemical Formulas

초록

Support