Orchestrating Kubernetes AI Inference Workloads with NVIDIA Grove — From DRA GA to KAI Scheduler Integration
Why Existing Kubernetes Alone Falls Short for AI Inference Workloads In March 2026, at KubeCon Europe 2026 in Amsterdam, NVIDIA officially announced the open-source project Grove. Grove is a Kubern...

Source: DEV Community
Why Existing Kubernetes Alone Falls Short for AI Inference Workloads In March 2026, at KubeCon Europe 2026 in Amsterdam, NVIDIA officially announced the open-source project Grove. Grove is a Kubernetes API for declaratively defining and orchestrating complex AI inference systems in Kubernetes. The era of simply spinning up a single Pod is over. Modern LLM inference architectures are evolving toward Disaggregated Inference patterns. The Prefill and Decode stages are separated, KV-Cache is shared, and each component must scale independently. The traditional Kubernetes combination of Deployment + HPA cannot properly manage such composite workloads. Core Problem: When you scale out Prefill workers, you need to proportionally increase Decode capacity, and when adding new service instances, you must maintain the correct ratio of each component. Pod-level autoscaling cannot express these dependencies. Grove was designed as a core component of the NVIDIA Dynamo framework to solve this problem,