預訓練的大型語言模型在上下文中學習隱馬爾可夫模型
Pre-trained Large Language Models Learn Hidden Markov Models In-context
June 8, 2025
作者: Yijia Dai, Zhaolin Gao, Yahya Satter, Sarah Dean, Jennifer J. Sun
cs.AI
摘要
隱馬爾可夫模型(HMMs)是建模具有潛在馬爾可夫結構的序列數據的基礎工具,然而將其擬合到現實世界的數據仍然具有計算上的挑戰性。在本研究中,我們展示了預訓練的大型語言模型(LLMs)能夠通過上下文學習(ICL)——即從提示中的示例推斷模式的能力——有效地建模由HMM生成的數據。在一系列多樣的合成HMM上,LLMs達到了接近理論最優的預測準確率。我們揭示了受HMM特性影響的新穎的規模化趨勢,並為這些實證觀察提供了理論猜想。我們還為科學家提供了使用ICL作為複雜數據診斷工具的實用指南。在現實世界的動物決策任務中,ICL與人類專家設計的模型相比,表現出競爭力。據我們所知,這是首次證明ICL能夠學習並預測HMM生成的序列——這一進展深化了我們對LLMs中上下文學習的理解,並確立了其作為揭示複雜科學數據中隱藏結構的強大工具的潛力。
English
Hidden Markov Models (HMMs) are foundational tools for modeling sequential
data with latent Markovian structure, yet fitting them to real-world data
remains computationally challenging. In this work, we show that pre-trained
large language models (LLMs) can effectively model data generated by HMMs via
in-context learning (ICL)x2013their ability to infer patterns from
examples within a prompt. On a diverse set of synthetic HMMs, LLMs achieve
predictive accuracy approaching the theoretical optimum. We uncover novel
scaling trends influenced by HMM properties, and offer theoretical conjectures
for these empirical observations. We also provide practical guidelines for
scientists on using ICL as a diagnostic tool for complex data. On real-world
animal decision-making tasks, ICL achieves competitive performance with models
designed by human experts. To our knowledge, this is the first demonstration
that ICL can learn and predict HMM-generated sequencesx2013an
advance that deepens our understanding of in-context learning in LLMs and
establishes its potential as a powerful tool for uncovering hidden structure in
complex scientific data.