❌

Normal view

Received before yesterday

Delivering Massive Performance Leaps for Mixture of Experts Inference on NVIDIA Blackwell

8 January 2026 at 19:43
As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads usersβ€”from consumers to enterprisesβ€”to interact with...

As AI models continue to get smarter, people can rely on them for an expanding set of tasks. This leads usersβ€”from consumers to enterprisesβ€”to interact with AI more frequently, meaning that more tokens need to be generated. To serve these tokens at the lowest possible cost, AI platforms need to deliver the best possible token throughput per watt. Through extreme co-design across GPUs, CPUs…

Source

Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

5 January 2026 at 22:20
AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI...

AI has entered an industrial phase. What began as systems performing discrete AI model training and human-facing inference has evolved into always-on AI factories that continuously convert power, silicon, and data into intelligence at scale. These factories now underpin applications that generate business plans, analyze markets, conduct deep research, and reason across vast bodies of…

Source

How to Scale Fast Fourier Transforms to Exascale on Modern NVIDIA GPU Architectures

12 December 2025 at 18:00
Fast Fourier Transforms (FFTs) are widely used across scientific computing, from molecular dynamics and signal processing to computational fluid dynamics (CFD),...

Fast Fourier Transforms (FFTs) are widely used across scientific computing, from molecular dynamics and signal processing to computational fluid dynamics (CFD), wireless multimedia, and machine-learning applications. As computational problem sizes scale to increasingly large domains, researchers require the capability to distribute FFT computations across hundreds or thousands of GPUs spanning…

Source

NVIDIA Blackwell Enables 3x Faster Training and Nearly 2x Training Performance Per Dollar than Previous-Gen Architecture

11 December 2025 at 19:20
AI innovation continues to be driven by three scaling laws: pre-training, post-training, and test-time scaling. Training is foundational to building smarter...

AI innovation continues to be driven by three scaling laws: pre-training, post-training, and test-time scaling. Training is foundational to building smarter models, and post-trainingβ€”which can include fine-tuning, reinforcement learning, and other techniquesβ€”helps to further increase accuracy for specific tasks, as well as provide models with new capabilities like the ability to reason.

Source

Enabling Multi-Node NVLink on Kubernetes for NVIDIA GB200 NVL72 and Beyond

10 November 2025 at 14:00
The NVIDIA GB200 NVL72 pushes AI infrastructure to new limits, enabling breakthroughs in training large-language models and running scalable, low-latency...

The NVIDIA GB200 NVL72 pushes AI infrastructure to new limits, enabling breakthroughs in training large-language models and running scalable, low-latency inference workloads. Increasingly, Kubernetes plays a central role for deploying and scaling these workloads efficiently whether on-premises or in the cloud. However, rapidly evolving AI workloads, infrastructure requirements…

Source

❌