LLM上下文条件化与PWP提示法在多模态化学式验证中的应用
LLM Context Conditioning and PWP Prompting for Multimodal Validation of Chemical Formulas
May 18, 2025
作者: Evgeny Markhasin
cs.AI
摘要
在复杂科学与技术文档中识别细微的技术错误,尤其是那些需要多模态解读(如图像中的公式)的情况,对大型语言模型(LLMs)构成了重大挑战,因其固有的纠错倾向可能掩盖不准确性。本探索性概念验证(PoC)研究基于持久工作流提示(PWP)原则,探讨了结构化LLM上下文调节作为一种方法论策略,在推理时调控LLM行为。该方法旨在提升现成通用LLM(特别是Gemini 2.5 Pro和ChatGPT Plus o3)在精确验证任务中的可靠性,关键之处在于仅依赖其标准聊天界面,无需API访问或模型修改。为探索此方法,我们聚焦于验证一份包含已知文本与图像错误的复杂测试论文中的化学公式。评估了多种提示策略:基础提示被证明不可靠,而采用PWP结构严格调节LLM分析思维的方法,则似乎提高了两种模型对文本错误的识别能力。值得注意的是,该方法还引导Gemini 2.5 Pro多次识别出先前人工审查中忽略的基于图像的公式细微错误,而ChatGPT Plus o3在我们的测试中未能完成此任务。这些初步发现揭示了阻碍细节导向验证的特定LLM操作模式,并表明基于PWP的上下文调节为开发更稳健的LLM驱动分析工作流提供了一种有前景且高度可及的技术,尤其适用于需要在科学与技术文档中细致检测错误的任务。超越此有限PoC的广泛验证对于确定其更广泛的适用性至关重要。
English
Identifying subtle technical errors within complex scientific and technical
documents, especially those requiring multimodal interpretation (e.g., formulas
in images), presents a significant hurdle for Large Language Models (LLMs)
whose inherent error-correction tendencies can mask inaccuracies. This
exploratory proof-of-concept (PoC) study investigates structured LLM context
conditioning, informed by Persistent Workflow Prompting (PWP) principles, as a
methodological strategy to modulate this LLM behavior at inference time. The
approach is designed to enhance the reliability of readily available,
general-purpose LLMs (specifically Gemini 2.5 Pro and ChatGPT Plus o3) for
precise validation tasks, crucially relying only on their standard chat
interfaces without API access or model modifications. To explore this
methodology, we focused on validating chemical formulas within a single,
complex test paper with known textual and image-based errors. Several prompting
strategies were evaluated: while basic prompts proved unreliable, an approach
adapting PWP structures to rigorously condition the LLM's analytical mindset
appeared to improve textual error identification with both models. Notably,
this method also guided Gemini 2.5 Pro to repeatedly identify a subtle
image-based formula error previously overlooked during manual review, a task
where ChatGPT Plus o3 failed in our tests. These preliminary findings highlight
specific LLM operational modes that impede detail-oriented validation and
suggest that PWP-informed context conditioning offers a promising and highly
accessible technique for developing more robust LLM-driven analytical
workflows, particularly for tasks requiring meticulous error detection in
scientific and technical documents. Extensive validation beyond this limited
PoC is necessary to ascertain broader applicability.Summary
AI-Generated Summary