복잡한 발화의 자연어 분해 및 해석

초록

자연어 인터페이스는 종종 사용자 요청을 프로그램, 데이터베이스 쿼리 또는 기타 구조화된 의도 표현으로 변환하기 위해 지도 학습 데이터를 필요로 합니다. 데이터 수집 과정에서 사용자 요구의 전체 범위를 예측하고 형식화하는 것은 어려울 수 있습니다. 예를 들어, 간단한 요청(예: 내일의 회의를 찾아줘 또는 매니저와의 회의를 정오로 옮겨줘)을 처리하도록 설계된 시스템에서 사용자는 더 복잡한 요청(예: 월요일과 화요일에 있는 모든 통화를 바꿔줘)을 표현할 수도 있습니다. 우리는 계층적 자연어 분해 과정을 통해 간단한 언어-코드 모델이 복잡한 발화를 처리할 수 있도록 하는 접근 방식을 소개합니다. 우리의 접근 방식은 사전 훈련된 언어 모델을 사용하여 복잡한 발화를 더 작은 자연어 단계의 시퀀스로 분해한 다음, 각 단계를 언어-코드 모델을 사용해 해석합니다. 이 접근 방식을 테스트하기 위해 우리는 DeCU(Decomposition of Complex Utterances)라는 새로운 NL-to-program 벤치마크를 수집하고 공개합니다. 실험 결과, 제안된 접근 방식은 거의 복잡한 훈련 데이터 없이도 복잡한 발화를 해석할 수 있으며, 표준 퓨샷 프롬프팅 접근 방식을 능가하는 성능을 보여줍니다.

English

Natural language interfaces often require supervised data to translate user requests into programs, database queries, or other structured intent representations. During data collection, it can be difficult to anticipate and formalize the full range of user needs -- for example, in a system designed to handle simple requests (like find my meetings tomorrow or move my meeting with my manager to noon), users may also express more elaborate requests (like swap all my calls on Monday and Tuesday). We introduce an approach for equipping a simple language-to-code model to handle complex utterances via a process of hierarchical natural language decomposition. Our approach uses a pre-trained language model to decompose a complex utterance into a sequence of smaller natural language steps, then interprets each step using the language-to-code model. To test our approach, we collect and release DeCU -- a new NL-to-program benchmark to evaluate Decomposition of Complex Utterances. Experiments show that the proposed approach enables the interpretation of complex utterances with almost no complex training data, while outperforming standard few-shot prompting approaches.

복잡한 발화의 자연어 분해 및 해석

Natural Language Decomposition and Interpretation of Complex Utterances

초록

Support