검증되지 않고 간과된 문제: 체크박스QA를 통해 대규모 언어 모델의 체크박스 블라인드 스팟 해결하기

초록

체크박스는 실제 문서 처리에서 틱의 유무가 데이터 추출과 의사 결정 과정에 직접적인 영향을 미치는 중요한 요소입니다. 그러나 대형 비전 및 언어 모델이 다양한 작업에서 뛰어난 성능을 보임에도 불구하고, 체크 가능한 콘텐츠를 해석하는 데는 어려움을 겪고 있습니다. 이 문제는 단 하나의 체크박스를 놓치는 것만으로도 비용이 큰 규제 또는 계약상의 문제를 초래할 수 있는 산업에서 특히 심각합니다. 이러한 격차를 해결하기 위해, 우리는 체크박스 관련 작업에서 모델 성능을 평가하고 개선하기 위해 설계된 특화된 리소스인 CheckboxQA 데이터셋을 소개합니다. 이 데이터셋은 현재 모델의 한계를 드러내며, 법률 기술 및 금융과 같은 분야에서의 응용에 중요한 영향을 미치는 문서 이해 시스템의 발전을 위한 가치 있는 도구로 활용될 수 있습니다. 데이터셋은 다음 링크에서 공개적으로 이용 가능합니다: https://github.com/Snowflake-Labs/CheckboxQA

English

Checkboxes are critical in real-world document processing where the presence or absence of ticks directly informs data extraction and decision-making processes. Yet, despite the strong performance of Large Vision and Language Models across a wide range of tasks, they struggle with interpreting checkable content. This challenge becomes particularly pressing in industries where a single overlooked checkbox may lead to costly regulatory or contractual oversights. To address this gap, we introduce the CheckboxQA dataset, a targeted resource designed to evaluate and improve model performance on checkbox-related tasks. It reveals the limitations of current models and serves as a valuable tool for advancing document comprehension systems, with significant implications for applications in sectors such as legal tech and finance. The dataset is publicly available at: https://github.com/Snowflake-Labs/CheckboxQA

검증되지 않고 간과된 문제: 체크박스QA를 통해 대규모 언어 모델의 체크박스 블라인드 스팟 해결하기

Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA

초록

Support