Automate Kubernetes AI Cluster Health with NVSentinel
8 December 2025 at 18:00
Kubernetes underpins a large portion of all AI workloads in production. Yet, maintaining GPU nodes and ensuring that applications are running, training jobs are progressing, and traffic is served across Kubernetes clusters is easier said than done. NVSentinel is designed to help with these challenges. An open source system for Kubernetes AI clusters, NVSentinel continuously monitors GPU…