❌

Normal view

Received before yesterday

Ensuring Balanced GPU Allocation in Kubernetes Clusters with Time-Based Fairshare

28 January 2026 at 17:00
NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to...

NVIDIA Run:ai v2.24 introduces time-based fairshare, a new scheduling mode that brings fair-share scheduling with time awareness for over-quota resources to Kubernetes clusters. This capability, built on the open source KAI Scheduler that powers NVIDIA Run:ai, addresses a long-standing challenge in shared GPU infrastructure. Consider two teams with equal priority sharing a cluster.

Source

Enabling Horizontal Autoscaling of Enterprise RAG Components on Kubernetes

12 December 2025 at 21:00
Today’s best AI agents rely on retrieval-augmented generation (RAG) to enable more accurate results. A RAG system facilitates the use of a knowledge base to...

Today’s best AI agents rely on retrieval-augmented generation (RAG) to enable more accurate results. A RAG system facilitates the use of a knowledge base to augment context to large language models (LLMs). A typical design pattern includes a RAG server that accepts prompt queries, consults a vector database for nearest context vectors, and then redirects the query with the appended context to an…

Source

Automate Kubernetes AI Cluster Health with NVSentinel

8 December 2025 at 18:00
Kubernetes underpins a large portion of all AI workloads in production. Yet, maintaining GPU nodes and ensuring that applications are running, training jobs are...

Kubernetes underpins a large portion of all AI workloads in production. Yet, maintaining GPU nodes and ensuring that applications are running, training jobs are progressing, and traffic is served across Kubernetes clusters is easier said than done. NVSentinel is designed to help with these challenges. An open source system for Kubernetes AI clusters, NVSentinel continuously monitors GPU…

Source

Streamline Complex AI Inference on Kubernetes with NVIDIA Grove

10 November 2025 at 14:00
Over the past few years, AI inference has evolved from single-model, single-pod deployments into complex, multicomponent systems. A model deployment may now...

Over the past few years, AI inference has evolved from single-model, single-pod deployments into complex, multicomponent systems. A model deployment may now consist of several distinct componentsβ€”prefill, decode, vision encoders, key value (KV) routers, and more. In addition, entire agentic pipelines are emerging, where multiple such model instances collaborate to perform reasoning, retrieval…

Source

❌