옥토 플래너: 플래너-액션 에이전트를 위한 온디바이스 언어 모델

초록

AI 에이전트는 다양한 분야에서 점점 더 중요한 역할을 하며, 자율적인 의사결정과 문제 해결을 가능하게 합니다. 이러한 에이전트가 효과적으로 작동하기 위해서는 최적의 행동 과정을 결정하고 계획된 행동을 실행하는 계획 프로세스가 필요합니다. 본 논문에서는 계획과 행동 실행을 두 개의 독립적인 구성 요소로 분리한 효율적인 온디바이스 Planner-Action 프레임워크를 제안합니다. 이 프레임워크는 엣지 디바이스에 최적화된 38억 개의 파라미터를 가진 LLM인 Phi-3 Mini를 기반으로 한 플래너 에이전트와, 함수 실행을 위해 Octopus 모델을 사용하는 액션 에이전트로 구성됩니다. 플래너 에이전트는 먼저 사용자 쿼리에 응답하여 작업을 하위 단계로 분해하고, 이를 액션 에이전트가 실행합니다. 리소스가 제한된 디바이스에서 성능을 최적화하기 위해, 우리는 컨텍스트 내 학습 대신 모델 미세 조정을 사용하여 계산 비용과 에너지 소비를 줄이고 응답 시간을 개선했습니다. 우리의 접근 방식은 사용 가능한 함수를 기반으로 다양한 계획 쿼리와 응답을 생성하기 위해 GPT-4를 사용하고, 데이터 품질을 보장하기 위해 후속 검증을 수행하는 것을 포함합니다. 우리는 이렇게 정제된 데이터셋에 대해 Phi-3 Mini 모델을 미세 조정하여, 도메인 내 테스트 환경에서 97%의 성공률을 달성했습니다. 다중 도메인 계획 문제를 해결하기 위해, 우리는 별도의 함수 하위 집합에 대해 훈련된 LoRA의 가중치를 병합하는 다중 LoRA 훈련 방법을 개발했습니다. 이 접근 방식은 리소스가 제한된 디바이스에서도 복잡한 다중 도메인 쿼리를 유연하게 처리할 수 있도록 하면서 계산 효율성을 유지합니다. 추가 연구를 지원하기 위해, 우리는 모델 가중치를 https://huggingface.co/NexaAIDev/octopus-planning에서 오픈소스로 공개했습니다. 데모는 https://www.nexa4ai.com/octo-planner를 참조하십시오.

English

AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at https://huggingface.co/NexaAIDev/octopus-planning. For the demo, please refer to https://www.nexa4ai.com/octo-planner.

옥토 플래너: 플래너-액션 에이전트를 위한 온디바이스 언어 모델

Octo-planner: On-device Language Model for Planner-Action Agents

초록

Support