Reading view

Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell

As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads users—from consumers to enterprises—to interact with...

As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads users—from consumers to enterprises—to interact with AI more frequently, meaning that more tokens need to be generated. To serve these tokens at the lowest possible cost, AI platforms need to deliver the best possible token throughput per watt. Through extreme co-design across GPUs, CPUs…

Source

  •  

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI...

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of…

Source

  •  

How to Scale Fast Fourier Transforms to Exascale on Modern NVIDIA GPU Architectures

Fast Fourier Transforms (FFTs) are widely used across scientific computing, from molecular dynamics and signal processing to computational fluid dynamics (CFD),...

Fast Fourier Transforms (FFTs) are widely used across scientific computing, from molecular dynamics and signal processing to computational fluid dynamics (CFD), wireless multimedia, and machine-learning applications. As computational problem sizes scale to increasingly large domains, researchers require the capability to distribute FFT computations across hundreds or thousands of GPUs spanning…

Source

  •  

NVIDIA Blackwell Enables 3x Faster Training and Nearly 2x Training Performance Per Dollar than Previous-Gen Architecture

AI innovation continues to be driven by three scaling laws: pre-training, post-training, and test-time scaling. Training is foundational to building smarter...

AI innovation continues to be driven by three scaling laws: pre-training, post-training, and test-time scaling. Training is foundational to building smarter models, and post-training—which can include fine-tuning, reinforcement learning, and other techniques—helps to further increase accuracy for specific tasks, as well as provide models with new capabilities like the ability to reason.

Source

  •  

Enabling Multi-Node NVLink on Kubernetes for NVIDIA GB200 NVL72 and Beyond

The NVIDIA GB200 NVL72 pushes AI infrastructure to new limits, enabling breakthroughs in training large-language models and running scalable, low-latency...

The NVIDIA GB200 NVL72 pushes AI infrastructure to new limits, enabling breakthroughs in training large-language models and running scalable, low-latency inference workloads. Increasingly, Kubernetes plays a central role for deploying and scaling these workloads efficiently whether on-premises or in the cloud. However, rapidly evolving AI workloads, infrastructure requirements…

Source

  •