NVIDIA Developer Blog·Hardware·3d ago·by Guy Saltoun·~3 min read

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

Maximizing the value of AI infrastructure demands deep visibility into GPU utilization. Yet many platform teams running AI workloads on Kubernetes operate with limited visibility into how their GPUs are used. Most don’t know who’s consuming them, how much memory is in use, and whether Kubernetes pods are pending or silently idle. Without a signal, GPU fleets are routinely underutilized and slow to surface scheduling bottlenecks until users escalate. The GPU Usage Monitor, built on the NVIDIA Data Center GPU Manager (DCGM) Exporter, enables real-time visibility into GPU allocation, compute utilization, memory consumption, and pod status across an entire Kubernetes cluster and through a single Helm chart deployment. The observability gap in GPU-Accelerated Kubernetes clusters For site reliability engineers (SREs) and platform teams managing GPU-accelerated Kubernetes clusters, two failure modes are common and costly. - Over-provisioning: Engineers request entire GPUs to avoid contention, but models frequently use 30-50% of available memory and compute. Without visibility into consumption, there’s no signal to right-size these allocations. The result is a cluster with high nominal demand but low effective utilization – paying for hardware that sits idle. - Pod starvation and scheduling blind spots: GPU requests can stack up, leaving pods queued in a Pending state and causing model training jobs or inference endpoints to stall before they start. Without a cluster-wide view of pending versus running GPU pods, these scheduling bottlenecks are often discovered too late – typically when a user reports a failure, rather than through a monitoring alert. The standard Kubernetes metrics stack – including kube-state-metrics and node-exporter – doesn’t surface GPU-specific signals. DCGM Exporter exposes per-GPU hardware metrics, but wiring it into Prometheus and Grafana with production-quality dashboards requires significant manual configuration effort. Teams end up with inconsistent, one-off monitoring setups, or no GPU monitoring at all. What is the GPU Usage Monitor? The GPU Usage Monitor is an open-source project that deploys a fully integrated GPU observability stack for Kubernetes. Rather than requiring SRE and platform teams to assemble and configure individual components, the GPU Usage Monitor uses DCGM Exporter, kube-state-metrics, Prometheus, and Grafana into a single deployment, complete with pre-built dashboards designed specifically for GPU-accelerated workloads. The design principle is operational simplicity. A single helm install command results in actionable GPU visibility within minutes, with no custom dashboard authoring or scrape configuration required. GPU Usage Monitor architecture The tool consists of four main components: - DCGM Exporter: Exposes NVIDIA GPU metrics (external – deployed via GPU Operator) - kube-state-metrics: Exposes Kubernetes pod and resource metrics - Prometheus: Collects and stores metrics from DCGM and kube-state-metrics - Grafana: Provides visualization through the GPU Usage Monitor Dashboard DCGM handles the hardware layer, and kube-state-metrics handles the Kubernetes layer. Prometheus and Grafana tie them together into a unified observability plane. Each component is well-understood independently by platform teams; the value of the chart is the integration. How to get started with the GPU Usage Monitor The GPU Usage Monitor is open source under the Apache 2.0 license and available now…

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters — image 2

read full article on NVIDIA Developer Blog →

0login to vote