Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell

8 January 2026 at 19:43

As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads users—from consumers to enterprises—to interact with...

As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads users—from consumers to enterprises—to interact with AI more frequently, meaning that more tokens need to be generated. To serve these tokens at the lowest possible cost, AI platforms need to deliver the best possible token throughput per watt. Through extreme co-design across GPUs, CPUs…

Source

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

NVIDIA Technical Blog

By:Kyle Aubrey

5 January 2026 at 22:20

end-to-end-press-ces26-inside-vr-tech-blog-1920x1080-4671300_-r1

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI...

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of…

Source

How to Scale Fast Fourier Transforms to Exascale on Modern NVIDIA GPU Architectures

NVIDIA Technical Blog

By:Zan Xu

12 December 2025 at 18:00

Fast Fourier Transforms (FFTs) are widely used across scientific computing, from molecular dynamics and signal processing to computational fluid dynamics (CFD),...

Fast Fourier Transforms (FFTs) are widely used across scientific computing, from molecular dynamics and signal processing to computational fluid dynamics (CFD), wireless multimedia, and machine-learning applications. As computational problem sizes scale to increasingly large domains, researchers require the capability to distribute FFT computations across hundreds or thousands of GPUs spanning…

Source

NVIDIA Blackwell Enables 3x Faster Training and Nearly 2x Training Performance Per Dollar than Previous-Gen Architecture

NVIDIA Technical Blog

By:Ashraf Eassa

11 December 2025 at 19:20

end-to-end-social-ai-factory-taiwan-1920x1080-4660123

AI innovation continues to be driven by three scaling laws: pre-training, post-training, and test-time scaling. Training is foundational to building smarter...

AI innovation continues to be driven by three scaling laws: pre-training, post-training, and test-time scaling. Training is foundational to building smarter models, and post-training—which can include fine-tuning, reinforcement learning, and other techniques—helps to further increase accuracy for specific tasks, as well as provide models with new capabilities like the ability to reason.

Source

Enabling Multi-Node NVLink on Kubernetes for NVIDIA GB200 NVL72 and Beyond

NVIDIA Technical Blog

By:Kevin Klues

10 November 2025 at 14:00

runai-tech-blog-compute-domains-1920x1080-4504000

The NVIDIA GB200 NVL72 pushes AI infrastructure to new limits, enabling breakthroughs in training large-language models and running scalable, low-latency...

The NVIDIA GB200 NVL72 pushes AI infrastructure to new limits, enabling breakthroughs in training large-language models and running scalable, low-latency inference workloads. Increasingly, Kubernetes plays a central role for deploying and scaling these workloads efficiently whether on-premises or in the cloud. However, rapidly evolving AI workloads, infrastructure requirements…

Source

Reading view