预训练的大型语言模型在上下文中学习隐马尔可夫模型
Pre-trained Large Language Models Learn Hidden Markov Models In-context
June 8, 2025
作者: Yijia Dai, Zhaolin Gao, Yahya Satter, Sarah Dean, Jennifer J. Sun
cs.AI
摘要
隐马尔可夫模型(HMMs)是处理具有潜在马尔可夫结构序列数据的基础工具,然而将其应用于现实世界数据仍面临计算上的挑战。本研究中,我们展示了预训练的大型语言模型(LLMs)能够通过上下文学习(ICL)——即从提示中的示例推断模式的能力——有效建模由HMMs生成的数据。在一系列多样化的合成HMMs上,LLMs的预测准确率接近理论最优。我们揭示了受HMM特性影响的新颖扩展趋势,并为这些实证观察提供了理论推测。此外,我们还为科研人员提供了使用ICL作为复杂数据诊断工具的实用指南。在现实世界的动物决策任务中,ICL的表现与人类专家设计的模型相当。据我们所知,这是首次证明ICL能够学习并预测HMM生成的序列——这一进展深化了我们对LLMs中上下文学习的理解,并确立了其作为揭示复杂科学数据中隐藏结构的强大工具的潜力。
English
Hidden Markov Models (HMMs) are foundational tools for modeling sequential
data with latent Markovian structure, yet fitting them to real-world data
remains computationally challenging. In this work, we show that pre-trained
large language models (LLMs) can effectively model data generated by HMMs via
in-context learning (ICL)x2013their ability to infer patterns from
examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve
predictive accuracy approaching the theoretical optimum. We uncover novel
scaling trends influenced by HMM properties, and offer theoretical conjectures
for these empirical observations. We also provide practical guidelines for
scientists on using ICL as a diagnostic tool for complex data. On real-world
animal decision-making tasks, ICL achieves competitive performance with models
designed by human experts. To our knowledge, this is the first demonstration
that ICL can learn and predict HMM-generated sequencesx2013an
advance that deepens our understanding of in-context learning in LLMs and
establishes its potential as a powerful tool for uncovering hidden structure in
complex scientific data.